You are here
»
Universal Catalogue
»
Spoken Resources
»
Desktop/microphone
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-S 0071
LVCSR Speech corpus for Indonesian
This is a speech database containing 84,000 read sentences. Each speaker uttered 210 sentences from a text corpus covering two domains:
- 3186 sentences from the news domain (Kompas and Tempo).
- 2500 sentences for application domains (telecommunication service, tele-home security, billing information services, reservation services, etc.).
Each speaker was asked to read a set of sentences, and nearly 500 speakers of three different age groups, genre (50% male/female) and four major Western Indonesian accents (Javanese, Sundanese, Batak, Standard Indonesian) were recorded by telephone or microphone in two sets:
- Daily News Task: 110 sentences/speaker, 44000 utterances, 43 hours of speech,
- Telephone Applications: 100 sentences/speaker, 40000 utterances, 36 hours of speech.
All recordings were carried out in a sound-proof room with 2 channels for clean speech (16 kHz) and telephone speech (8 kHz).
This LVCSR database will support the development of automatic speech recognition systems for Indonesian.
The database comprises also a pronunciation dictionary derived from the Daily News and Telephone Application Tasks. The dictionary includes 40k words: 30k Indonesian words, 8k place and person names and 2k foreign words.
Applications
Applications possible :
Speech recognition#Automatic speech recognition
application Area :
Research
Contents
Click on the arrow to display content.
speech corpus
Language(s) :
Indonesian
Source Channel :
Microphone#Telephone
Speech Acquisition Mode : Acoustic
Friday 22 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4