ELRA - ELRA-U-S0267 : Chinese Speech Corpus 2

You are here » Universal Catalogue » Spoken Resources » Desktop/microphone

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-S0267

Chinese Speech Corpus 2

This corpus consists of speech read by 300 native Madarin speakers (150 males, 150 females), reading a set of 150 words and 100 sentences each. Speakers are native Pekinese speakers.

It was recorded by a microphone Sennheiser E835S at the following sampling and data format: 16000 Hz, 16-bit Windows wave format.

Texts for prompts are extracted from:
- People Daily(Ren Min Ri Bao) 1993/1994/1996/1997
- Economic Daily(Jing Ji Ri Bao) 1992/1994
- Market(Shi Chang Bao) 1994
- Xinhua News(Xinhuashe Wengao) 1994/1995/1996

Words and sentences were selected from the sentences (a total of 685,982 sentences) with 10-20 characters sampled from the population corpus. Frequencies of Di-IFs (IF: Chinese initial, Chinese final/rhyme) in Chinese syllables were also considered.

Prompts consist of 150 sets of words (a total of 22,500 words) and 150 sets of sentences (a total of 14,872 sentences).

Production

Creation date : 2003

Applications

Applications existing : Speech recognition

Contents

Click on the arrow to display content.

speech corpus
Language(s) : Chinese (China)
Quantisation : 16-bit
Source Channel : Microphone
Recording Environment : Office