You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0103
Multext-East POS tagged 1984
This multilingual resource is a morpho-syntactically annotated version of the novel "1984" (G. Orwell) in 8 languages. Context disambiguated lemmas and morpho-syntactic descriptions are marked up for each word.
Languages: English, Romanian, Slovene, Czech, Bulgarian, Estonian, Hungarian, Serbian.
It is a part of a multilingual dataset containing multiple resources for Central and Eastern European languages:
- MULTEXT-East morphosyntactic specifications,
- MULTEXT-East morphosyntactic lexicons,
- MULTEXT-East "1984" parallel corpus,
- MULTEXT-East comparable corpus,
- MULTEXT-East parallel speech corpus (from EUROM-1 speech corpus),
- and associated documentation.
The central component of the MULTEXT-East corpus is the novel "1984" by G. Orwell.
The dataset is compliant with the EAGLES and TEI P4 recommendations.
It is a resource of value for Central and Eastern languages engineering research and development.
Identification
Period of coverage :
Version :
v3, 2004
Version history :
v1: 1998 ('East meets West' CDROM) v2: 2002
Production
Project :
TELRI, CONCEDE, Multext-East Projects
Creation date :
2004
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Multilingual
Language(s) :
English (United Kingdom) ; Romanian (Romania) ; Slovene (Slovenia) ; Czech (Czech Republic) ; Bulgarian (Bulgaria) ; Estonian (Estonia) ; Hungarian (Hungary) ; Serbian
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Morphological
Annotation Scheme : TEI
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4