You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W0292
SYN2005 corpus
The SYN2005 corpus is a synchronic representative corpus of contemporary written Czech. It contains 100 million words (tokens), lemmatised and Part-Of-Speech tagged.
This corpus is identical with its predecessor, the SYN2000 corpus (see U-W0293), but contains only recent texts (from 2000 to 2004) with a different repartition : 40% fiction, 27% technical literature, 33% journalism.
The SYN2000 corpus is a synchronic representative corpus of contemporary written Czech. It contains 100 million words (tokens), lemmatised and Part-Of-Speech tagged.
This corpus contains texts written between 1990 and 1999, and some important works of Czech literature from the XXth century. It was intended to cover many different genres. Repartition : 15% fiction, 25% technical literature, 60% journalism.
This corpus is a part of the CNC (Czech National Corpus).
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Czech
Number of tokens :
100 million words
Annotation Granularity : Word
Annotation level : Morphological
Lexical Unit Information : Single word lemma
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4