Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-WC0250
Croatian Language Corpus
The Croatian Language Corpus cover various domains and genres. It includes literature and other written sources from the second half of the 19th century on.

The Croatian Language Corpus consists of fundamental Croatian literature (novels, short stories, drama, poetry), non-fiction, scientific publications from various domains, University textbooks, school books, translated literature from outstanding Croatian translators, online journals and newspapers, books adapted to nowadays standard Croatian.

It can be divided into 2 sub-corpora :
- the literature sub-corpus,
- the newspapers sub-corpus.

This corpus indexes more than 100,000 tokens, and is growing continuously. Annotation includes morphological segmentation, lemmatization, phonemic transcription, morphosyntactic annotation and syntactic parses. It is in the XML format and follows the TEI P5 guidelines.
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4