ELRA - ELRA-U-W0384 : Intercorp

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W0384

Intercorp

This is a multilingual corpus which contains 44 million of words. It consists of texts of fiction semi-automatically aligned between the Czech version and one of the following languages: English, French, German, Russian and Spanish.

It is partially morphosyntactically annotated.

Work is still under progress to enlarge the database to new languages.

Identification

Period of coverage : 2000-2008

Version :
Version history :

Production

Project : Intercorp project

Applications


application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Multilingual
Language(s) : Czech >>>> English ; Czech >>>> French ; Czech >>>> German ; Czech >>>> Russian ; Czech >>>> Spanish
Alignment : Parallel
Number of tokens : 44 million of words
Annotation Coverage : Partial
Annotation level : Morphological