Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-W0292
SYN2005 corpus
The SYN2005 corpus is a synchronic representative corpus of contemporary written Czech. It contains 100 million words (tokens), lemmatised and Part-Of-Speech tagged.

This corpus is identical with its predecessor, the SYN2000 corpus (see U-W0293), but contains only recent texts (from 2000 to 2004) with a different repartition : 40% fiction, 27% technical literature, 33% journalism.


The SYN2000 corpus is a synchronic representative corpus of contemporary written Czech. It contains 100 million words (tokens), lemmatised and Part-Of-Speech tagged.

This corpus contains texts written between 1990 and 1999, and some important works of Czech literature from the XXth century. It was intended to cover many different genres. Repartition : 15% fiction, 25% technical literature, 60% journalism.

This corpus is a part of the CNC (Czech National Corpus).
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4