You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0101
Multext-East 1984 Parallel Corpus
This multilingual parallel corpus consists of the novel "1984" (G. Orwell) and contains approximately 100,000 words per language (English, Romanian, Slovene, Czech, Bulgarian, Estonian, Hungarian, Latvian, Lithuanian, Serbian, Russian).
Texts are tokenised and annotated with POS and morpho-syntactic information. The corpus is SGML marked-up.
The different translations have been automatically aligned and then hand-validated. The corpus is conform to the CES standards.
It is a part of a multilingual dataset containing multiple resources for Central and Eastern European languages:
- MULTEXT-East morphosyntactic specifications,
- MULTEXT-East morphosyntactic lexicons,
- MULTEXT-East morphosyntactically annotated "1984" corpus,
- MULTEXT-East comparable corpus,
- MULTEXT-East parallel speech corpus (from EUROM-1 speech corpus),
- and associated documentation.
The central component of the MULTEXT-East corpus is the novel "1984" by G. Orwell.
The dataset is compliant with the EAGLES and TEI P4 recommendations.
It is a resource of value for Central and Eastern European languages engineering research and development.
Identification
Period of coverage :
Version :
v3, 2004
Version history :
v1: 1998 ('East meets West' CDROM) v2: 2002
Production
Project :
TELRI, CONCEDE, Multext-East Projects
Creation date :
2004
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Multilingual
Language(s) :
English (United Kingdom)Romanian ; EnglishSlovene ; EnglishCzech ; EnglishBulgarian ; EnglishEstonian ; EnglishHungarian ; EnglishLatvian ; EnglishLithuanian ; EnglishSerbian ; EnglishRussian
Alignment :
Sentence
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Morphological
Annotation Scheme : TEI
Annotation language : SGML
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4