Universal Catalogue  
  You are here » Universal Catalogue » Spoken Resources » Desktop/microphone
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-S0283
Large Vocabulary Thai Continuous Speech Broadcast News corpus
This is a corpus of Thai television broadcast news. It contains 60 hours of audio recordings from 36 speakers (21 female and 15 male).

The corpus was transcribed and annotated for news topics (18 different topics annotated), acoustic conditions (background and speaker noises, microphone or telephone channel, ...), overlapping, and named entities (almost 9,000 unique NE tagged). Lotus-BN corpus has a rich vocabulary of approximately 26,000 words. A phonetic lexicon was also extracted from these transcriptions.

Work is still under progress to enlarge the database. Target size: 100 hours of speech.
Production
Creation date : 2008
Applications
Applications possible : Speech recognition
Contents Click on the arrow to display content.
 speech corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4