You are here
»
Universal Catalogue
»
Spoken Resources
»
Telephone
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-ST79
DIHANA Corpus
This is a spontaneous-speech dialogue corpus acquired using the Wizard of Oz technique. The task consisted of the retrieval information about Spanish nationwide trains by telephone. 300 different scenarios have been defined. Each scenario contains an objective, a situation and the specific requirements of the travel.
In total 225 speakers recorded 900 dialogues with 6,278 user turns and 48,243 words. The training corpus contains 720 dialogues recorded by 180 speakers and the test corpus consists of 135 dialogues recorded by 45 speakers.
Spontaneous-speech events were labeled from acoustic, lexical and syntactic points of view. 499 lexical and 545 syntactical event were annotated.
The corpus acquisition architecture is composed of an audio server, an automatic speech recognition server, a speech understanding server, a Wizard of Oz server, a dialogue manager server, an oral answer generation server, a speech-to-text conversion server and a communications management client. Finally, each speaker read 16 sentences (8 referred to the task and 8 were phonetically balanced sentences).
The entire corpus consists of 3,600 sentences in total, for 10.8 hours of human voice recorded.
Contents
Click on the arrow to display content.
speech corpus
Language(s) :
Spanish
Source Channel :
Telephone
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4