You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W0364
Nova beseda
This is a wide collection of 4,158 Slovenian texts from various categories: newspapers, magazines, formal speech, fiction, non-fiction, scientific and technical texts. It contains about 162 million words, marked at the sentence level.
The corpus consists of 6 main parts:
- 2,310 texts collected from the Delo daily newspaper between 1998 and 2005 (120 million words),
- 711 texts of formal speech from Slovenian National Assembly session transcripts, between 1996 and 2004 (20 million words),
- 778 texts of fiction in Slovenian, including the complete works of the famous writers Drago Jancar, Ciril Kosmac and Ivan Cankar (12 million words),
- 78 texts of the Monitor computer magazine between 1999 and 2004 and Viva healthy living magazine (6 million words),
- 251 texts of non-fiction in Slovenian (2 million words),
- 26 scientific and technical publications (2 million words).
Before 2000, the corpus used to be called CORTES (CORpus of TExts in Slovenian).
Production
Creation date :
1999-2005
Applications
application Area :
Education#Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Slovenian
Number of tokens :
162 million words
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4