ELRA - ELRA-U-S0201 : Corpus of Estonian Dialects

You are here » Universal Catalogue » Spoken Resources » Desktop/microphone

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-S0201

Corpus of Estonian Dialects

The Corpus of Estonian Dialects (CED) is a speech database which contains interviews on different topics. Speakers are distributed among the nine main dialects of Estonian : Mid, Eastern, Western dialects (for the North Estonian dialect group), Võru, Mulgi, Tartu, Seto dialects (for the South Estonian dialect group), North-Eastern (Alutaguse), Coastal dialects (for the North-Eastern Coastal dialect group). Dialect recordings were tape-recorded, mainly during the 1960s and 1970s.

Recordings are provided with phonetic and text transcription, including features of spoken language such as pause-fillers, discourse particles, word repetitions, corrections, unfinished words, speaker turn, etc.

The CED contains about 1,000,000 transcribed words and 500,000 morphologically tagged words (26 word classes according to morphological inflections, syntactic characteristics and semantics), as well as information about speakers and recordings.

Production

Project : Corpus of Estonian Dialects

Applications


application Area : Research

Contents

Click on the arrow to display content.

speech corpus
Language(s) : Estonian
Source Channel : Microphone
Sound Type Annotation : Mispronunciation#Truncation
Transcription Entries : Orthographic#Phonetic#Translitteration
Lexical Unit Information : Notes
Annotation language : XML