ELRA - ELRA-U-W 0031 : Mixed Corpus of Estonian

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W 0031

Mixed Corpus of Estonian

This is an Estonian 80,000,000 word corpus (still under construction). It gathers texts of various genres : newspapers, magazines, internet chat, legal documents, fiction, etc. and has been designed to support the Estonian language and culture. It can be used in computational linguistics as well as in theoretical linguistics. It is considered as the reference corpus of Estonian.

The MCE also contains a balanced corpus called 'The Balanced Corpus', which contains 15 millions of words from fiction (1/3), journalistic (1/3) and scientific writing (1/3).

The target size is 200,000 million words.

Identification

Period of coverage : from 1995 onwards

Version :
Version history :

Applications

	Applications possible : Discourse analysis
application Area : Research

Technical Informations

Fileformat : Plain text

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : Estonian (Estonia)