You are here
Universal Catalogue
Multimodal/Multimedia Resources
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Catalog Reference : ELRA-U-W0311
Russian National Corpus
The Russian National Corpus (RNC) is a collection of written, spoken and multimodal corpora, representing about 300 million tokens.
It covers a wide range of sources from the 18th to the early 21st century: original works of fiction (prose, drama and poetry) and other sources of written and spoken language (memoirs, essays, journalistic texts, scientific and popular scientific literature, public speeches, private talks, movie speech, letters, diaries, etc).
It contains the following subcorpora:
- the Main corpus, which contains prosaic texts, representing about 160 million tokens,
- the Deeply Annotated Syntactic corpus, which is annotated for morphological and syntactic structure,
- the Poetry corpus, which provides tags especially for poetry (about 5 million tokens),
- the Paper corpus (or Corpus of the contemporary Russian press), which includes around 100 million tokens and consists of paper texts and news reports from the Web (2000-2008),
- the Disambiguated corpus, which contains texts with disambiguated grammatical homonyms,
- the Accentological corpus, which contains poetic and prosaic texts of 18-21 centuries, annotated from the point of view of the real (not normative) Russian accentuation,
- the Parallel Russo-English and Russo-German corpora, which are sentence-aligned (9 million tokens),
- the Corpus of Spoken Russian, which contains transcripts of public and spontaneous spoken Russian in addition to transcripts of Russian movies between 1930 and 2007,
- the Multimodal Russian Corpus (MURCO), a collection of clips extracted from movies aligned with the corresponding transcript (still in progress).
Creation date :
application Area :
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Annotation Coverage : Partial
Annotation Granularity : Word
Annotation level : Morphological
Lexical Unit Information : Single word lemma
written corpus
Number of languages
: Bilingual
Language(s) :
Russian >>>> English ; Russian >>>> German
Alignment :
Number of languages
Language(s) :
Sunday 23 February, 2025
Joint Copyright © 2008
Universal Catalogue 1.0.4