ELRA - ELRA-U-W0375 : Coreference Corpus for Dutch

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W0375

Coreference Corpus for Dutch

This is a Dutch corpus annotated for coreference including annotation for:
- identity relations between noun phrases,
- bound relations where an anaphor refers to a quantified antecedent,
- bridge relations when the anaphor is a subset of the antecedent,
- predicative relations, indicating extra information about the referent.

The corpus contains:
- texts from newspaper articles (35,166 tokens - 105 documents)
- transcribed spoken language from the Corpus of Spoken Dutch, CGN (33,048 tokens - 264 documents),
- entries from the Spectrum medical encyclopedia (135,828 tokens - 497 documents).

This corpus was created to develop coreference resolution systems.

Production

Project : COREA project

Creation date : 2007

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : Dutch
Annotation Coverage : Full
Annotation Mode : Manual