You are here
»
Universal Catalogue
»
Spoken Resources
»
Desktop/microphone
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-S0283
Large Vocabulary Thai Continuous Speech Broadcast News corpus
This is a corpus of Thai television broadcast news. It contains 60 hours of audio recordings from 36 speakers (21 female and 15 male).
The corpus was transcribed and annotated for news topics (18 different topics annotated), acoustic conditions (background and speaker noises, microphone or telephone channel, ...), overlapping, and named entities (almost 9,000 unique NE tagged). Lotus-BN corpus has a rich vocabulary of approximately 26,000 words. A phonetic lexicon was also extracted from these transcriptions.
Work is still under progress to enlarge the database. Target size: 100 hours of speech.
Production
Creation date :
2008
Applications
Applications possible :
Speech recognition
Contents
Click on the arrow to display content.
speech corpus
Language(s) :
Thai
Source Channel :
Microphone#Telephone
Sound Type Annotation : Background noise#Mispronunciation
Transcription Entries : Orthographic
Transcription Segmentation : Breath group
Friday 22 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4