You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0245
INTERA Multilingual Corpus
The INTERA corpus contains 12 million written words in various domains: law, health, education, tourism, environment, politics, finance.
It is a comparable corpus in which texts are aligned at sentence level (TMX standard), annotated at sentence level, morphologically tagged and lemmatized (XCES).
Language pairs: Bulgarian - English (2 million words), Greek - English (4), Serbian - English (2) and Slovene - English (4).
Production
Project :
INTERA
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Multilingual
Language(s) :
BulgarianEnglish ; SerbianEnglish ; SloveneEnglish ; GreekEnglish
Alignment :
Sentence
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Morphological
Annotation Mode : Automatic
Annotation language : XML
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4