ELRA - ELRA-U-S 0050 : Chuan-dialectal Chinese Corpus

You are here » Universal Catalogue » Spoken Resources » Desktop/microphone

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-S 0050

Chuan-dialectal Chinese Corpus

This is a corpus of Chuan dialectal Chinese (Chendu-centered).

Sampling rate: 22,050 Hz
Three channels: two standard microphones and one USB microphone (Sony C-38B, Sennhiser e835s, Logitech LPAC-5000).
Number of speakers: 36 (19 male, 17 female).
Age: 19-27.
Data: 200 long sentences, 10 digits, 26 English letters, 200 phrases.
Transcription: character, syllable.

The processing of Chinese dialects is a global issue in the processing of the Chinese language in general since China can be divided in 8 major dialectal regions (in addition to Mandarin) and each of these can be divided in many sub-categories.

Data have also been collected for Wu and Min dialects. It is planned to collect data for more dialects (Xiang: Changsha-centered, Yue: Guangzhou-centered, Jin: Taiyuan-centered, etc.).

Applications

	Applications possible : Speech recognition
application Area : Research

Contents

Click on the arrow to display content.

speech corpus
Language(s) : Chinese
Source Channel : Microphone
Speech Acquisition Mode : Acoustic
Transcription Entries : Orthographic