Universal Catalogue  
  You are here » Universal Catalogue » Spoken Resources » Broadcast Resources
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-S0269
Mass
This is a Malay speech corpus. It contains 70 hours of read speech recorded by 90 speakers and 10 hours of broadcast news from local TV stations in Malaysia.

Read speech was recorded by Malay, Indian and Chinese speakers (female and male) through a headset microphone, at the sampling rate of 22kHz. They read sentences extracted from a text corpus collected from local news websites. Each speaker read about 5,000 words. The target is to record a total of 140 hours of speech.

The broadcast news part was recorded daily by 30-minutes slots. Audio files were then segmented into 5-minutes segments and stored at 16KHz, 16 bit pcm (wavefiles). This part is manually transcribed and segmented into speech utterances. The aim will be to collect a total of 15 hours of broadcast news.
Applications
Applications existing : Automatic speech recognition
Contents Click on the arrow to display content.
 speech corpus #18824
 speech corpus #28824
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4