You are here
»
Universal Catalogue
»
Spoken Resources
»
Desktop/microphone
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-S 0190
SI-TAL ADAM Corpus
The SITAL ADAM corpus contains transcribed travel agent-client dialogues (450), which are human-human and human-machine interactions. Each dialogue is annotated at five levels of linguistic information: prosody, morphosyntax, syntax, semantics and pragmatics.
Human-human interactions are simulated telephone conversations which have been recorded on a digital tape as signed linear PCM 16bit at 16kHz with two microphones (one directional and one "close-talk"). The total amount of recorded speech is more than 7 hours, for a total number of 58,377 words (200 dialogues).
Human-machine dialogues (250) contain 1,250 utterances recorded at 8kHz and stored according the PCM-Ulaw 8 bit protocol.
Each dialogue has been orthographically transcribed (EAGLES) and the transcription is linked to the audio signal file. Each transcription file is also linked to five XML annotation files, one for each annotation levels.
SI-TAL stands for 'Integrated System for the Automatic treatment of Language'.
ADAM stands for 'Architecture for Dialogue Annotation on Multiple Levels'.
Production
Project :
SI-TAL Project
Applications
Applications possible :
Speech recognition#Spoken dialogue systems
application Area :
Research
Contents
Click on the arrow to display content.
speech corpus
Language(s) :
Italian
Source Channel :
Microphone
Transcription Entries : Orthographic
Annotation language : XML
Friday 22 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4