You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0147
Multieight-04 Corpus
The Multieight-04 corpus is a collection of 700 questions in several European languages and their manually retrieved answers. It is a gold standard training set created for the QA@CLEF-2004 track.
The corpus is in XML; each entry is structured in tags, with attributes and values defining the language, the type of question, the category of the answer (person, location, etc.), the answer, etc.
Languages: German, English, Spanish, French, Italian, Dutch and Portuguese, plus Bulgarian and Finnish exclusively as source languages.
Identification
Period of coverage :
Version :
v1.2
Version history :
Production
Project :
CLEF
Creation date :
2004
Applications
Applications possible :
Information retrieval
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Multilingual
Language(s) :
French ; English ; Finnish ; Bulgarian ; Portuguese ; Spanish ; German ; Dutch ; Italian
Annotation Coverage : Full
Annotation Granularity : Sentence
Annotation Mode : Manual
Annotation language : XML
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4