You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0031
Mixed Corpus of Estonian
This is an Estonian 80,000,000 word corpus (still under construction). It gathers texts of various genres : newspapers, magazines, internet chat, legal documents, fiction, etc. and has been designed to support the Estonian language and culture. It can be used in computational linguistics as well as in theoretical linguistics. It is considered as the reference corpus of Estonian.
The MCE also contains a balanced corpus called 'The Balanced Corpus', which contains 15 millions of words from fiction (1/3), journalistic (1/3) and scientific writing (1/3).
The target size is 200,000 million words.
Identification
Period of coverage :
from 1995 onwards
Version :
Version history :
Applications
Applications possible :
Discourse analysis
application Area :
Research
Technical Informations
Fileformat :
Plain text
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Estonian (Estonia)
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4