Universal Catalogue  
  You are here » Universal Catalogue » Spoken Resources » Broadcast Resources
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Catalog Reference : ELRA-U-S 0001
This corpus represents Italian spontaneous speech events collected from 1965 onwards to develop studies on the intonation of Italian.
It is composed of the two following main sub-corpora.

- the LABLITA Corpus of Adult Spontaneous Spoken Italian.
Transmission channels: broadcasting, telephone, face to face communication.
Number of words: 640,514.
Length of the sessions: 152 hours.
Length of the transcribed signal: 63 hours.

- the LABLITA Collection of Longitudinal Corpora of Early Acquisition of Italian. This collection gathers two sets of studies : the Ferrara corpus for Italian spoken in one of its northern varieties and the Florence Corpus for Italian spoken in Western Tuscany.
Number of words: 210,554 (child : 57,284).
Length of the sessions: 71 hours.

Sessions are recorded in wav files (Windows PCM 22,050Hz 16 bit) and are delivered with:
- Orthographic transcription in CHAT format (with sentence segmentation and prosodic parsing).
- Metadata in CHAT and IMDI format.
- Text to speech synchronization in xml files.

A sample of the LABLITA corpora is also available to public through the C-ORAL-ROM corpus (distributed through the ELRA catalogue http://catalog.elra.info under the reference S0172).
Period of coverage : from 1965 onwards
Version :
Version history :
Applications possible : Speech recognition#Speech synthesis
application Area : Research
Technical Informations
Fileformat : wav
Contents Click on the arrow to display content.
 speech corpus #17904
 speech corpus #27904

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4