ELRA - ELRA-U-S0215 : Corpus DIMEx100

You are here » Universal Catalogue » Spoken Resources » Desktop/microphone

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-S0215

Corpus DIMEx100

This is a Mexican Spanish speech corpus of 6,000 sentences. Spoken content were collected and selected from the Internet (sentences of 5 to 15 words). Then 100 speakers recorded 60 sentences each (50 different utterances and 10 in common for all). They were recorded in a sound studio by two microphones.

Phonetic transcription is provided and is time-aligned on the audio signal. Annotation includes three levels of granularity, according to the number of phonetic units in the phonetic alphabet: 54 units (T-54), 44 units (T-44) and 22 units (T-22).

This corpus can be used in the area of speech recognition, especially for speaker identification and classification.

Production

Project : DIME-II project

Applications

Applications existing : Automatic speech recognition	Applications possible : Speaker identification#Automatic speech recognition

Technical Informations

Distribution medium : DVD

Contents

Click on the arrow to display content.

speech corpus
Language(s) : Spanish (Mexico)
Quantisation : 16 bit
Source Channel : Microphone
Speech Acquisition Mode : Acoustic
Recording Environment : Sound studio
Speech Content : Continuous sentences
Transcription Entries : Orthographic#Phonetic