ELRA - ELRA-U-S 0001 : LABLITA Corpus

You are here » Universal Catalogue » Spoken Resources » Broadcast Resources

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-S 0001

LABLITA Corpus

This corpus represents Italian spontaneous speech events collected from 1965 onwards to develop studies on the intonation of Italian.
It is composed of the two following main sub-corpora.

- the LABLITA Corpus of Adult Spontaneous Spoken Italian.
Transmission channels: broadcasting, telephone, face to face communication.
Number of words: 640,514.
Length of the sessions: 152 hours.
Length of the transcribed signal: 63 hours.

- the LABLITA Collection of Longitudinal Corpora of Early Acquisition of Italian. This collection gathers two sets of studies : the Ferrara corpus for Italian spoken in one of its northern varieties and the Florence Corpus for Italian spoken in Western Tuscany.
Number of words: 210,554 (child : 57,284).
Length of the sessions: 71 hours.

Sessions are recorded in wav files (Windows PCM 22,050Hz 16 bit) and are delivered with:
- Orthographic transcription in CHAT format (with sentence segmentation and prosodic parsing).
- Metadata in CHAT and IMDI format.
- Text to speech synchronization in xml files.

A sample of the LABLITA corpora is also available to public through the C-ORAL-ROM corpus (distributed through the ELRA catalogue http://catalog.elra.info under the reference S0172).

Identification

Period of coverage : from 1965 onwards

Version :
Version history :

Applications

	Applications possible : Speech recognition#Speech synthesis
application Area : Research

Technical Informations

Fileformat : wav

Contents

Click on the arrow to display content.

speech corpus #17904
Language(s) : Italian (Italy)
Duration : 62 hours
Quantisation : 16 bits
Signal Encoding : Linear PCM
Source Channel : Microphone#Radio#Telephone
Speech Acquisition Mode : Acoustic
Transcription Entries : Orthographic
Annotation level : Prosodic
speech corpus #27904
Language(s) : Italian (Italy)
Quantisation : 16 bits
Signal Encoding : Linear PCM
Source Channel : Microphone
Speech Acquisition Mode : Acoustic
Recording Environment : in the kindergarten or at home
Recording Period : from 1979 to 1990
Task : activities with toys or visual books
Transcription Entries : Orthographic
Annotation level : Prosodic