ELRA - ELRA-U-S0274 : Catalan Corpus and Voices for TTS

You are here » Universal Catalogue » Spoken Resources » Desktop/microphone

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-S0274

Catalan Corpus and Voices for TTS

This is a speech corpus in Catalan. It consists of 20 hours of speech recorded by two professional speakers (one female and one male) selected among 8 speakers.

They produced about 10 hours each (approximately 90,000 words) and were asked to read a wide a large variety of texts with a good phonetic and prosodic coverage (news, novels, education books, web information, application phrases, numbers).

It was automatically segmented and manually transcribed (orthographic and phonetic transcription). A basic prosodic annotation is also provided, and pitch labelling was automatically produced.

The Catalan Corpus and Voices for TTS (or Festcat) have been used to build voices for the Festival TTS. This a resource of great value for corpus-based synthesis.

Applications

Applications existing : Speech synthesis

Contents

Click on the arrow to display content.

speech corpus
Language(s) : Catalan
Duration : 20 hours
Source Channel : Microphone
Recording Environment : Recording studio
Transcription Entries : Orthographic#Phonetic
Transcription Segmentation : Breath group
Annotation level : Prosodic