ELRA - ELRA-U-S 0014 : CLIPS Corpus of Spoken Italian

You are here » Universal Catalogue » Spoken Resources » Telephone

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-S 0014

CLIPS Corpus of Spoken Italian

It gathers 100 hours of recorded speech divided into five sub-corpora:
- from radiotelevision (radio and television),
- conversational (map task and other),
- read (list of sentences and others),
- telephonic (automatic and wizard of Oz),
- recorded in unechoic chamber (list of sentences and list of balanced sentences).

It takes into account different types of variability: regional, social, stylistic, individual.

It is available in wav format with orthographic transcriptions in txt and phonetic annotations.

Clips stands for Corpora e Lessici dell'Italiano Parlato e scritto. The Clips project started in 2000 with the objective of building corpora and lexicons for Italian (spoken and written).

Production

Project : CLIPS Project

Creation date : 2000-2004

Applications

	Applications possible : Speaker identification#Speaker verification#Speech recognition#Speech synthesis#Automatic speech recognition#Automatic person recognition
application Area : Research

Technical Informations

Fileformat : wav

Contents

Click on the arrow to display content.

speech corpus
Language(s) : Italian (Italy)
Source Channel : Microphone#Radio#Telephone#Television
Speech Acquisition Mode : Acoustic
Transcription Entries : Orthographic
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Phonetic