Universal Catalogue  
  You are here » Universal Catalogue » Spoken Resources » Desktop/microphone
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-S0216
The Lancaster Los Angeles Spoken Chinese Corpus
The Lancaster Los Angeles Spoken Chinese Corpus (LLSCC) is a corpus of spoken Mandarin Chinese. It consists of dialogues (55%) and monologues (45%) including both spontaneous (57%) and scripted (43%) speech.

Transcription contains 1,002,151 words, corresponding to 73,976 sentences and 49,670 utterance units (paragraphs).

This corpus includes various spoken registers :
- face-to-face conversation (60,806 words),
- telephone conversation between overseas Chinese and their family in China (295,026 words),
- play/movie scripts (80,446 words),
- TV talk show transcripts (118,588 words),
- formal debates between university students recorded between 1993 and 2002 (77,909 words),
- spontaneous oral narratives of native Beijing residents (102,262 words),
- edited oral narratives (267,114 words).

It is encoded in Unicode, structured in XML and Part-of-speech tagged.
Contents Click on the arrow to display content.
 speech corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4