You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W0359
Bangla News Corpus
This is a corpus of news in Bangla (or Bengali). It is also called the Prothom-Alo corpus because texts have been collected from the electronic version of the most widely read newspaper in Bangladesh, Prothom-Alo. Texts have been encoded in Unicode.
The corpus contains 18,100,378 word tokens and 384,048 distinct word types.
Production
Project :
Pan Localization
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Bengali
Character set :
Unicode
Document source :
Internet
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4