You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W0311
Russian National Corpus
The Russian National Corpus (RNC) is a collection of written, spoken and multimodal corpora, representing about 300 million tokens.
It covers a wide range of sources from the 18th to the early 21st century: original works of fiction (prose, drama and poetry) and other sources of written and spoken language (memoirs, essays, journalistic texts, scientific and popular scientific literature, public speeches, private talks, movie speech, letters, diaries, etc).
It contains the following subcorpora:
- the Main corpus, which contains prosaic texts, representing about 160 million tokens,
- the Deeply Annotated Syntactic corpus, which is annotated for morphological and syntactic structure,
- the Poetry corpus, which provides tags especially for poetry (about 5 million tokens),
- the Paper corpus (or Corpus of the contemporary Russian press), which includes around 100 million tokens and consists of paper texts and news reports from the Web (2000-2008),
- the Disambiguated corpus, which contains texts with disambiguated grammatical homonyms,
- the Accentological corpus, which contains poetic and prosaic texts of 18-21 centuries, annotated from the point of view of the real (not normative) Russian accentuation,
- the Parallel Russo-English and Russo-German corpora, which are sentence-aligned (9 million tokens),
- the Corpus of Spoken Russian, which contains transcripts of public and spontaneous spoken Russian in addition to transcripts of Russian movies between 1930 and 2007,
- the Multimodal Russian Corpus (MURCO), a collection of clips extracted from movies aligned with the corresponding transcript (still in progress).
Production
Creation date :
2004
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
#18626
Number of languages
: Monolingual
Language(s) :
Russian
Annotation Coverage : Partial
Annotation Granularity : Word
Annotation level : Morphological
Lexical Unit Information : Single word lemma
written corpus
#28626
Number of languages
: Bilingual
Language(s) :
Russian >>>> English ; Russian >>>> German
Alignment :
Sentence
Video1
#28626
Number of languages
:
Language(s) :
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4