ELRA - ELRA-U-M 0008 : Multext-East morpho-syntactic lexicons

You are here » Universal Catalogue » Written Resources » Multilingual lexicons

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-M 0008

Multext-East morpho-syntactic lexicons

Those morpho-syntactic lexicons are based on Multext-East corpora. Each lexical entry is composed of three fields: the word-form, the lemma and the morpho-syntactic description. The number of lexical entries is of at least 15,000 per language (41,000 for Romanian).

Languages: English, Romanian, Slovene, Czech, Bulgarian, Estonian, Hungarian.

It is a part of a multilingual dataset containing multiple resources for Central and Eastern European languages:
- MULTEXT-East morphosyntactic specifications,
- MULTEXT-East "1984" parallel corpus,
- MULTEXT-East morphosyntactically annotated "1984" corpus,
- MULTEXT-East comparable corpus,
- MULTEXT-East parallel speech corpus (from EUROM-1 speech corpus),
- and associated documentation.
The central component of the MULTEXT-East corpus is the novel "1984" by G. Orwell.

The dataset is compliant with the EAGLES and TEI P4 recommendations.
It is a resource of value for Central and Eastern European languages engineering research and development.

Identification

Period of coverage :

Version : v3, 2004
Version history : v1: 1998 ('East meets West' CDROM) v2: 2002

Production

Project : TELRI, CONCEDE, Multext-East Projects

Creation date : 2004

Applications


application Area : Research

Contents

Click on the arrow to display content.

written lexicon
Number of languages : Multilingual
Language(s) : English (United Kingdom) ; Romanian (Romania) ; Slovene (Slovenia) ; Czech (Czech Republic) ; Bulgarian (Bulgaria) ; Estonian (Estonia) ; Hungarian (Hungary)