Universal Catalogue  
  You are here » Universal Catalogue » Spoken Resources » Broadcast Resources
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-S 0001
LABLITA Corpus
This corpus represents Italian spontaneous speech events collected from 1965 onwards to develop studies on the intonation of Italian.
It is composed of the two following main sub-corpora.

- the LABLITA Corpus of Adult Spontaneous Spoken Italian.
Transmission channels: broadcasting, telephone, face to face communication.
Number of words: 640,514.
Length of the sessions: 152 hours.
Length of the transcribed signal: 63 hours.

- the LABLITA Collection of Longitudinal Corpora of Early Acquisition of Italian. This collection gathers two sets of studies : the Ferrara corpus for Italian spoken in one of its northern varieties and the Florence Corpus for Italian spoken in Western Tuscany.
Number of words: 210,554 (child : 57,284).
Length of the sessions: 71 hours.

Sessions are recorded in wav files (Windows PCM 22,050Hz 16 bit) and are delivered with:
- Orthographic transcription in CHAT format (with sentence segmentation and prosodic parsing).
- Metadata in CHAT and IMDI format.
- Text to speech synchronization in xml files.

A sample of the LABLITA corpora is also available to public through the C-ORAL-ROM corpus (distributed through the ELRA catalogue http://catalog.elra.info under the reference S0172).
Identification
Period of coverage : from 1965 onwards
Version :
Version history :
Applications
Applications possible : Speech recognition#Speech synthesis
application Area : Research
Technical Informations
Fileformat : wav
Contents Click on the arrow to display content.
 speech corpus #17904
 speech corpus #27904
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4