ELRA - ELRA-U-W 0148 : DISEQuA Corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W 0148

DISEQuA Corpus

The DISEQuA corpus is composed of 450 questions formulated into four languages: Dutch, Italian, Spanish and English. The answers have been manually retrieved in three document collections (not in English): La Stampa and SDA newspaper/wire articles (year 1994) for Italian, EFE (year 1994) for Spanish and Algemeen Dagblad and NRC Handelsblad (years 1994 and 1995) for Dutch.

The corpus is in XML; each entry is structured in tags, with attributes and values defining the language, the type of question, the category of the answer (person, location, etc.), the answer, etc.

This questions/answers set enables to test or train cross-language QA systems in twelve different combinations.

DISEQuA stands for Dutch, Italian, Spanish and English collection of Questions and Answers. It gathers resources created for CLEF 2003.

Identification

Period of coverage :

Version : v1.1
Version history :

Production

Project : CLEF

Creation date : 2003

Applications

	Applications possible : Information retrieval
application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Multilingual
Language(s) : Dutch ; English ; Italian ; Spanish
Annotation Coverage : Full
Annotation Granularity : Sentence
Annotation Mode : Manual
Annotation language : XML