You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-WC338
Brown Corpus of Bulgarian
The corpus is structured along the standards of the Brown University Corpus and comprises 1,000,805 words extracted mainly from electronic texts. In the creation of the corpus the requirement was observed for including only original Bulgarian texts. However, some exceptions had to be made for romance and western text excerpts, taken from foreign language sources translated into Bulgarian because of the lack of original Bulgarian texts in these genres.
The BCB corpus is divided into 500 text units - approximately 2000 words each. The majority of texts consist of more than 2000 words and only a small number of less than 2000. The texts were sampled from 15 different text categories. The number of texts in each category varies:
Press – reportage: 44;
Press – editorial: 27;
Press – reviews: 17;
Religion: 17;
Skill and hobbies: 36;
Popular lore: 48;
Belles-lettres: 75;
Miscellaneous – government and house organs: 30;
Learned: 80;
Fiction – general: 29;
Fiction – mystery: 24;
Fiction – science: 6;
Fiction – adventure: 29;
Fiction –romance: 29;
Humor: 9.
Extracts from the BCB have been tagged and semantically disambiguated to form respectively the Tagged Corpus of Bulgarian (U-W 0128) and the Semantic Corpus of Bulgarian (U-W 0129). The Bulgarian Brown Corpus with the full-length texts is also available (nearly 5 million words).
Identification
Period of coverage :
1990–2005
Version :
Version history :
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Bulgarian
Number of tokens :
one million words
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4