Universal Catalogue  
  You are here » Universal Catalogue » Spoken Resources » Broadcast Resources
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-S 0035
Mandarin Chinese Broadcast News Corpus
The MATBN corpus contains 198 one-hour news shows (for a total of approximately 2.3 million Chinese characters). They were collected between 2001 and 2003 from the Public Television Service Foundation (Taiwan).

They were first recorded in stereo with a 44.1kHz sampling rate and 16 bit resolution with a DAT recorder in the TV broacasting studio. The DAT recordings were then converted into a single Microsoft Windows wave file. The signal was down-sampled to 16kHz with a 16 bit resolution.

The corpus has been segmented, labeled and transcribed manually (with DGA&LDC Transcriber). The transcription is aligned to the speech signal.

SGML tagging annotation: acoustic conditions, background conditions, story boundaries, speaker turn boundaries, audible acoustic events (hesitations, repetitions, vocal non-speech events, external noises).

The MATBN is the result of a joint project sponsored by the National Science Council of Taiwan.
Applications
Applications possible : Speech recognition#Speech synthesis
application Area : Research#Other
Contents Click on the arrow to display content.
 speech corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4