ELRA - ELRA-U-S0258 : Regional Accented Speech Corpus

You are here » Universal Catalogue » Spoken Resources » Desktop/microphone

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-S0258

Regional Accented Speech Corpus

This is a Regional Accented Speech Corpus, which contains spontaneous and read speech from 800 Chinese speakers with 4 different accents (Chongqing, Shanghai, Guangzhou, Xiamen). 200 speakers, balanced for age, gender and educational background, were recorded for each dialect.

For the spontaneous speech part, they had to talk 4 to 5 minutes on a chosen topic and to answer 15 elicited questions. Then they were asked to read 23 common sentences, 15 dialectal words and 110 phonetically balanced sentences. Recordings include the acoustic environment and background noises. It was sampled at 16KHz and 16 bit.

Transcriptions are provided in Chinese characters for the spontaneous part and a phonetic transcription is available for the read part.

This corpus was built in the framework of the RASC863 project, which aim is to create a large speech database for the 10 representative regional accents in order to train ASR systems. Recordings were also conducted in the 6 others dialects: Taiyuan, Changsha, NanChang, Wenzhou, Luoyang and Nanjing.

Production

Project : RASC 863 project

Applications

	Applications possible : Automatic speech recognition

Contents

Click on the arrow to display content.

speech corpus
Language(s) : Chinese (China)
Quantisation : 16 bit
Source Channel : Microphone
Speech Acquisition Interface : Sound card
Transcription Entries : Orthographic#Phonetic