ELRA - ELRA-U-W 0017 : Estonian Dialogue Corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W 0017

Estonian Dialogue Corpus

The Estonian Dialogue Corpus (EDiC) is composed of different types of data:
- transcripts of spoken dialogues taken from the Corpus of Spoken Estonian (233,000 running words) with dialogue acts annotated,
- written dialogues collected by the Wizard-of-Oz method (2,500 running words collected in 2001 and 10,000 collected in 2009),
- human-computer interactions.

Processing: morphological analysis, syntactic analysis and annotation of dialogue acts based on the Conversation Analysis theory.

The project started in 2001.

Identification

Period of coverage :

Version : 2010
Version history : 2008 (previous update)

Applications

	Applications possible : Spoken dialogue systems
application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : Estonian (Estonia)
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Syntactic
Part of Speech : Nouns#Verbs#Adverbs#Adjectives#Pronouns#Determiners#Articles#Prepositions#Postpositions#conjunctions