ELRA - ELRA-WC342 : Croatian National Corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-WC342

Croatian National Corpus

Croatian National Corpus (HNK) is a collection of selected texts covering different media, genres, styles, fields and topics. It is composed of two sub-corpora: one for contemporary Croatian and the other called HETA (Croatian Electronic Textual Archive).

Compilation of the corpus is still going on. The objective is to achieve a balanced corpus of 200 million words, with full POS/MSD-tagging and (partial) syntactic and semantic annotations.
The HNK currently contains 101.3 million tokens and is in XML format (XCES).

Identification

Period of coverage :

Version : v2.0
Version history : v1.0 v2.5 (announced)

Production

Creation date : 2005

Applications


application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : Croatian
Number of tokens : 101.3 million tokens
Annotation language : XML