ELRA - ELRA-U-W 0147 : Multieight-04 Corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W 0147

Multieight-04 Corpus

The Multieight-04 corpus is a collection of 700 questions in several European languages and their manually retrieved answers. It is a gold standard training set created for the QA@CLEF-2004 track.

The corpus is in XML; each entry is structured in tags, with attributes and values defining the language, the type of question, the category of the answer (person, location, etc.), the answer, etc.

Languages: German, English, Spanish, French, Italian, Dutch and Portuguese, plus Bulgarian and Finnish exclusively as source languages.

Identification

Period of coverage :

Version : v1.2
Version history :

Production

Project : CLEF

Creation date : 2004

Applications

	Applications possible : Information retrieval
application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Multilingual
Language(s) : French ; English ; Finnish ; Bulgarian ; Portuguese ; Spanish ; German ; Dutch ; Italian
Annotation Coverage : Full
Annotation Granularity : Sentence
Annotation Mode : Manual
Annotation language : XML