ELRA - ELRA-U-S0268 : Chinese Speech Corpus 4

You are here » Universal Catalogue » Spoken Resources » Desktop/microphone

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-S0268

Chinese Speech Corpus 4

This corpus consists of speech read by 100 native Madarin speakers (50 males, 50 females), reading Chinese names, command words for cell phones and 11-digit telephone numbers.

It was recorded by a microphone Sennheiser E835S at the following sampling and data format: 16000 Hz, 16-bit Windows wave format.

Each speaker uttered a set of 250 tokens. Each set contains:
- 150 items of Chinese names for training data (50 items in consideration of distribution of Chinese names and 100 items in consideration of syllable balanced words); 120 items for testing data (80 items in consideration of distribution of Chinese names and 40 items in consideration of syllable balanced words)
- 57 command words (divided into 36 crucial words and 21 non-crucial words)
- 11-digit telephone numbers: Chinese telephone numbers including cell phone numbers generated by random sampling.

Production

Creation date : 2004

Applications

Applications existing : Speech recognition

Contents

Click on the arrow to display content.

speech corpus
Language(s) : Chinese
Quantisation : 16-bit
Source Channel : Microphone
Recording Environment : Office