You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0148
DISEQuA Corpus
The DISEQuA corpus is composed of 450 questions formulated into four languages: Dutch, Italian, Spanish and English. The answers have been manually retrieved in three document collections (not in English): La Stampa and SDA newspaper/wire articles (year 1994) for Italian, EFE (year 1994) for Spanish and Algemeen Dagblad and NRC Handelsblad (years 1994 and 1995) for Dutch.
The corpus is in XML; each entry is structured in tags, with attributes and values defining the language, the type of question, the category of the answer (person, location, etc.), the answer, etc.
This questions/answers set enables to test or train cross-language QA systems in twelve different combinations.
DISEQuA stands for Dutch, Italian, Spanish and English collection of Questions and Answers. It gathers resources created for CLEF 2003.
Identification
Period of coverage :
Version :
v1.1
Version history :
Production
Project :
CLEF
Creation date :
2003
Applications
Applications possible :
Information retrieval
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Multilingual
Language(s) :
Dutch ; English ; Italian ; Spanish
Annotation Coverage : Full
Annotation Granularity : Sentence
Annotation Mode : Manual
Annotation language : XML
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4