You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0120
The MultiSemCor Corpus
The MultiSemCor is composed of 116 English texts with their corresponding 116 Italian translations, for a total of about 500,000 tokens. It is based on the English SemCor corpus. Texts are aligned at word level and contain semantic annotations (senses used in MultiWordNet).
The overall objective of the project is to create a semantically annotated corpus by exploiting information from an already annotated parallel corpus.
MultiSemCor has been designed to train and test word sense disambiguation systems, and also to enrich the MultiWordNet. It represents a resource for lexicography, translation studies, linguistic teaching and multilingual information browsing.
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Bilingual
Language(s) :
EnglishItalian
Alignment :
Word
Number of tokens :
250,000
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Semantic
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4