Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-W0311
Russian National Corpus
The Russian National Corpus (RNC) is a collection of written, spoken and multimodal corpora, representing about 300 million tokens.

It covers a wide range of sources from the 18th to the early 21st century: original works of fiction (prose, drama and poetry) and other sources of written and spoken language (memoirs, essays, journalistic texts, scientific and popular scientific literature, public speeches, private talks, movie speech, letters, diaries, etc).

It contains the following subcorpora:
- the Main corpus, which contains prosaic texts, representing about 160 million tokens,
- the Deeply Annotated Syntactic corpus, which is annotated for morphological and syntactic structure,
- the Poetry corpus, which provides tags especially for poetry (about 5 million tokens),
- the Paper corpus (or Corpus of the contemporary Russian press), which includes around 100 million tokens and consists of paper texts and news reports from the Web (2000-2008),
- the Disambiguated corpus, which contains texts with disambiguated grammatical homonyms,
- the Accentological corpus, which contains poetic and prosaic texts of 18-21 centuries, annotated from the point of view of the real (not normative) Russian accentuation,
- the Parallel Russo-English and Russo-German corpora, which are sentence-aligned (9 million tokens),
- the Corpus of Spoken Russian, which contains transcripts of public and spontaneous spoken Russian in addition to transcripts of Russian movies between 1930 and 2007,
- the Multimodal Russian Corpus (MURCO), a collection of clips extracted from movies aligned with the corresponding transcript (still in progress).
Production
Creation date : 2004
Applications
application Area : Research
Contents Click on the arrow to display content.
 written corpus #18626
 written corpus #28626
 Video1 #28626
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4