Universal Catalogue  
  You are here » Universal Catalogue » Spoken Resources » Desktop/microphone
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-S 0071
LVCSR Speech corpus for Indonesian
This is a speech database containing 84,000 read sentences. Each speaker uttered 210 sentences from a text corpus covering two domains:

- 3186 sentences from the news domain (Kompas and Tempo).
- 2500 sentences for application domains (telecommunication service, tele-home security, billing information services, reservation services, etc.).

Each speaker was asked to read a set of sentences, and nearly 500 speakers of three different age groups, genre (50% male/female) and four major Western Indonesian accents (Javanese, Sundanese, Batak, Standard Indonesian) were recorded by telephone or microphone in two sets:
- Daily News Task: 110 sentences/speaker, 44000 utterances, 43 hours of speech,
- Telephone Applications: 100 sentences/speaker, 40000 utterances, 36 hours of speech.

All recordings were carried out in a sound-proof room with 2 channels for clean speech (16 kHz) and telephone speech (8 kHz).

This LVCSR database will support the development of automatic speech recognition systems for Indonesian.

The database comprises also a pronunciation dictionary derived from the Daily News and Telephone Application Tasks. The dictionary includes 40k words: 30k Indonesian words, 8k place and person names and 2k foreign words.
Applications
Applications possible : Speech recognition#Automatic speech recognition
application Area : Research
Contents Click on the arrow to display content.
 speech corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4