Universal Catalogue  
  You are here » Universal Catalogue » Spoken Resources » Speech Related
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : U-S0332
LC-STAR Russian lexicon
The LC-STAR Russian lexicon was created within the scope of the LC-STAR project (IST 2001-32216) which was sponsored by the European Commission.

The lexicon comprises about 128,000 words, distributed over three categories:

- a set of 77,154 common word entries. This set is extracted from a corpus of more than 20 million words distributed over 6 different domains (sports/games, news, finance, culture/entertainment, consumer information, personal communications). This was done with the aim of reaching a target for each domain of at least 95% self coverage. In addition to extracting word lists from the corpus, a list of closed set (function) word classes are included in the final word list.

- a set of 51,074 proper names (including person names, family names, cities, streets, companies and brand names) divided into 3 domains. Multiple word names such as New_York are kept together in all three domains, and they count as one entry. The 3 domains consist of first and last names (19,740 different entries), place names (19,306 different entries), and organisations (13,194 different entries).

- and a list of 12,012 special application words translated from English terms defined by the LC-STAR consortium. This list contains: numbers, letters, abbreviations and specific vocabulary for applications controlled by voice (information retrieval, controlling of consumer devices, etc.).

The lexicon is provided in XML format and includes phonetic transcriptions in SAMPA.

ISLRN : 092-978-177-962-0
Production
Project : LC-STAR (IST 2001-32216)
Contents Click on the arrow to display content.
 speech lexicon 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4