You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-WC0250
Croatian Language Corpus
The Croatian Language Corpus cover various domains and genres. It includes literature and other written sources from the second half of the 19th century on.
The Croatian Language Corpus consists of fundamental Croatian literature (novels, short stories, drama, poetry), non-fiction, scientific publications from various domains, University textbooks, school books, translated literature from outstanding Croatian translators, online journals and newspapers, books adapted to nowadays standard Croatian.
It can be divided into 2 sub-corpora :
- the literature sub-corpus,
- the newspapers sub-corpus.
This corpus indexes more than 100,000 tokens, and is growing continuously. Annotation includes morphological segmentation, lemmatization, phonemic transcription, morphosyntactic annotation and syntactic parses. It is in the XML format and follows the TEI P5 guidelines.
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Croatian
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4