ELRA - ELRA-U-S 0120 : Chinese Telephony Conversational Corpus for Speech Processing

You are here » Universal Catalogue » Spoken Resources » Telephone

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-S 0120

Chinese Telephony Conversational Corpus for Speech Processing

The corpus is composed of 1206 ten-minute natural Mandarin conversations between strangers or friends, for a total amount of 200 hours. Each conversation focuses on a single topic and the total number of topics is 40.

The data have been recorded over public telephone networks (landline and cellular channels). They have also been transcribed manually with standard Chinese characters (GBK), annotated with specific mark-ups for spontaneous speech and time aligned.

This corpus is a resource of great value for conversational and spontaneous Mandarin speech recognition.

EARSCTS stands for Effective, Affordable, Reusable Speech-to-text Chinese telephony speech corpus.

Applications

	Applications possible : Speech recognition
application Area : Research

Contents

Click on the arrow to display content.

speech corpus
Language(s) : Mandarin Chinese
Duration : 200 h
Source Channel : Telephone
Speech Acquisition Mode : Acoustic
Transcription Entries : Orthographic