Universal Catalogue  
  You are here » Universal Catalogue » Spoken Resources » Telephone
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : U-S0330
Mandarin-5000 database
The MANDARIN-5000 database contains the recordings of 4,752 speakers (2383 males, 2369 females) of Mandarin as first or second language (3,222 native speakers) recorded over the fixed and mobile telephone networks in all provinces of mainland China, including Hong Kong (fixed network: cordless handset: 513 speakers, POT (plain old telephone): 3,558 speakers; mobile network: 491 speakers; undetermined (cordless or mobile): 190 speakers). The database design closely follows the SpeechDat(II) conventions, in particular with respect to the content of the database. The database consists of 1 CD containing all documentation files including the phonetic lexicon, and 3 DVD-R containing the data, i.e. speech files and corresponding transcription files.

Speech samples are stored as sequences of 8-bit 8 kHz A-law, uncompressed. Each prompted utterance is stored in a separate file, and each signal file is accompanied by a transcription file encoded in GB-2312 and ASCII which contains the orthographic representation (i.e. pictograms), phonemic transcription in Pinyin with tones and word boundaries.

Each speaker uttered the following 54 items:
- 6 isolated application words (25 fixed, 5 free)
- 1 additional application command with a parameter (e.g. name dialling)
- 1 sequence of 10 isolated digits (balanced)
- 6 digit strings (in total balanced for digits, letters, dashes and their transitions)
- 3 dates, where 1 of them spontaneous
- 2 word spotting phrases using an application word
- 2 handset information ('mobile phone' / 'cordless phone')
- 2 isolated digits
- 2 spelled words (letter sequences)
- 1 currency money amount
- 1 natural plain number (balanced for words and transitions)
- 1 natural number with measure word
- 8 names (persons, spelling, cities, companies), where 3 of them spontaneous
- 1 spontaneous train schedule request (origin, destination, date, time)
- 1 spontaneous correction
- 1 spontaneous answer to question for time
- 1 spontaneous answer to question for time or day
- 4 spontaneous answers to questions, including fuzzy yes/no
- For training 8 phonetically rich sentences (read newspaper text) and alternatively for test 8 sentences dictated out of newspaper article
- 1 time of day (spontaneous)
- 1 time phrase (read)

The following age distribution has been obtained: 239 speakers are under 16, 2,391 are between 16 and 30, 1,449 are between 31 and 45, 601 are between 46 and 60, and 32 speakers are over 60. (The age of 40 speakers was not determined.)

A pronunciation lexicon with orthographic representation (i.e. pictograms), phonemic transcription in Pinyin with tones and frequency of occurrences is also included.

ISLRN : 010-586-752-168-6
Contents Click on the arrow to display content.
 speech corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4