ELRA - ELRA-U-W 0120 : The MultiSemCor Corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W 0120

The MultiSemCor Corpus

The MultiSemCor is composed of 116 English texts with their corresponding 116 Italian translations, for a total of about 500,000 tokens. It is based on the English SemCor corpus. Texts are aligned at word level and contain semantic annotations (senses used in MultiWordNet).

The overall objective of the project is to create a semantically annotated corpus by exploiting information from an already annotated parallel corpus.
MultiSemCor has been designed to train and test word sense disambiguation systems, and also to enrich the MultiWordNet. It represents a resource for lexicography, translation studies, linguistic teaching and multilingual information browsing.

Applications


application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Bilingual
Language(s) : EnglishItalian
Alignment : Word
Number of tokens : 250,000
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Semantic