You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-WC342
Croatian National Corpus
Croatian National Corpus (HNK) is a collection of selected texts covering different media, genres, styles, fields and topics. It is composed of two sub-corpora: one for contemporary Croatian and the other called HETA (Croatian Electronic Textual Archive).
Compilation of the corpus is still going on. The objective is to achieve a balanced corpus of 200 million words, with full POS/MSD-tagging and (partial) syntactic and semantic annotations.
The HNK currently contains 101.3 million tokens and is in XML format (XCES).
Identification
Period of coverage :
Version :
v2.0
Version history :
v1.0 v2.5 (announced)
Production
Creation date :
2005
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Croatian
Number of tokens :
101.3 million tokens
Annotation language : XML
Sunday 24 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4