Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-W 0102
Multext-East Comparable Corpus
This multilingual comparable corpus contains a fiction part and a news part. Data is comparable across the languages in terms of number and size of texts. It is divided in twelve parts of 100,000 words each.
Languages: Romanian, Slovene, Czech, Bulgarian, Estonian, Hungarian.

It is a part of a multilingual dataset containing multiple resources for Central and Eastern European languages:
- MULTEXT-East morphosyntactic specifications,
- MULTEXT-East morphosyntactic lexicons,
- MULTEXT-East morphosyntactically annotated "1984" corpus,
- MULTEXT-East "1984" parallel corpus,
- MULTEXT-East parallel speech corpus (from EUROM-1 speech corpus),
- and associated documentation.
The central component of the MULTEXT-East corpus is the novel "1984" by G. Orwell.

The dataset is compliant with the EAGLES and TEI P4 recommendations.
It is a resource of value for Central and Eastern European languages engineering research and development.
Identification
Period of coverage :
Version : v3, 2004
Version history : v1: 1998 ('East meets West' CDROM) v2: 2002
Production
Project : TELRI, CONCEDE, Multext-East projects Creation date : 2004
Applications
application Area : Research
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4