You are here
»
Universal Catalogue
»
Spoken Resources
»
Broadcast Resources
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-S0269
Mass
This is a Malay speech corpus. It contains 70 hours of read speech recorded by 90 speakers and 10 hours of broadcast news from local TV stations in Malaysia.
Read speech was recorded by Malay, Indian and Chinese speakers (female and male) through a headset microphone, at the sampling rate of 22kHz. They read sentences extracted from a text corpus collected from local news websites. Each speaker read about 5,000 words. The target is to record a total of 140 hours of speech.
The broadcast news part was recorded daily by 30-minutes slots. Audio files were then segmented into 5-minutes segments and stored at 16KHz, 16 bit pcm (wavefiles). This part is manually transcribed and segmented into speech utterances. The aim will be to collect a total of 15 hours of broadcast news.
Applications
Applications existing :
Automatic speech recognition
Contents
Click on the arrow to display content.
speech corpus
#18824
Language(s) :
Malay
Duration : 90 hours
Source Channel :
Microphone
Recording Environment : Sound proof room
Speech Content : Continuous sentences
speech corpus
#28824
Language(s) :
Malay
Duration : 10 hours
Quantisation : 16-bit
Source Channel :
Television
Transcription Entries : Orthographic
Annotation language : XML
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4