You are here
»
Universal Catalogue
»
Spoken Resources
»
Broadcast Resources
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-S 0035
Mandarin Chinese Broadcast News Corpus
The MATBN corpus contains 198 one-hour news shows (for a total of approximately 2.3 million Chinese characters). They were collected between 2001 and 2003 from the Public Television Service Foundation (Taiwan).
They were first recorded in stereo with a 44.1kHz sampling rate and 16 bit resolution with a DAT recorder in the TV broacasting studio. The DAT recordings were then converted into a single Microsoft Windows wave file. The signal was down-sampled to 16kHz with a 16 bit resolution.
The corpus has been segmented, labeled and transcribed manually (with DGA&LDC Transcriber). The transcription is aligned to the speech signal.
SGML tagging annotation: acoustic conditions, background conditions, story boundaries, speaker turn boundaries, audible acoustic events (hesitations, repetitions, vocal non-speech events, external noises).
The MATBN is the result of a joint project sponsored by the National Science Council of Taiwan.
Applications
Applications possible :
Speech recognition#Speech synthesis
application Area :
Research#Other
Contents
Click on the arrow to display content.
speech corpus
Language(s) :
Mandarin Chinese
Duration : 198 hours
Quantisation : 16 bit
Source Channel :
Television
Speech Acquisition Mode : Acoustic
Sound Type Annotation : Adverts#Articulatory noise#Background noise#Mispronunciation#Music#Speaker noise
Transcription Entries : Orthographic
Transcription Segmentation : Episode#Speaker turn
Annotation language : SGML
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4