ELRA - ELRA-U-S 0149 : LUNA Corpus

You are here » Universal Catalogue » Spoken Resources » Telephone

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-S 0149

LUNA Corpus

The LUNA corpus is a multi-domain multilingual dialogue corpus. It is expected to contain 1,000 human-human and 8,100 human-machine dialogues in French, Italian and Polish. They are collected in the following application domains: travel information and reservation, public transportation information, IT help desk, telecom customer care and financial information and transaction.

Processing: segmentation (into dialogue turns), standard orthographic transcription, multi-level annotation.
Levels of annotation: syntactic, semantic and discourse information.

It is compiled to support the development of a robust natural spoken language understanding toolkit for multilingual dialogue services (LUNA project).

The LUNA corpus is currently under development.

Production

Project : LUNA project

Applications

	Applications possible : Speech recognition#Spoken dialogue systems#Automatic speech recognition
application Area : Research

Contents

Click on the arrow to display content.

speech corpus
Language(s) : French ; Italian ; Polish
Source Channel : Telephone
Speech Acquisition Mode : Acoustic
Transcription Entries : Orthographic
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Semantic