ELRA - ELRA-U-S0231 : CRBLP speech corpora

You are here » Universal Catalogue » Spoken Resources » Desktop/microphone

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-S0231

CRBLP speech corpora

It consists of three speech corpora in Bangla (Bengali): a read speech corpus, a diphone corpus and a speech corpus for acoustic analysis.

The read speech corpus contains recordings of a professional speaker’s voice. Audio and text have been time-aligned and labeled at the sentence-level. It represents around 10,000 sentences and 18,000 unique tokens.

The diphone corpus contains recordings of 4355 sentences formed by combining nonsense words with 4355 diphones (combination of two phones) in order to cover all the Bangla language's phones.

The last corpus was collected for acoustic analysis. It consists of recordings of possible combination of phones collected in various texts. It is aimed at determining the number of phoneme available in Bangla.

Production

Project : Pan Localization

Creation date : 2008

Applications

	Applications possible : Speech recognition

Contents

Click on the arrow to display content.

speech corpus
Language(s) : Bengali
Source Channel : Microphone
Annotation Granularity : Sentence