ELRA - ELRA-U-W 0085 : Tübingen Partially Parsed Corpus of Written German

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W 0085

Tübingen Partially Parsed Corpus of Written German

The Tübingen Partially Parsed Corpus of Written German is composed of articles from 'die Tageszeitung' (taz newspaper). The data comprises more than 200 million word tokens and has been automatically annotated (POS, morphological ambiguity classes, clause structure, topological fields and chunks).
Some regular types of named entities are also annotated (including dates, telephone numbers, and number/unit combinations).

This resource is available in XML format.

More releases are planned.

Applications


application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : German (Germany)
Annotation Coverage : Partial
Annotation Granularity : Word
Annotation level : Syntactic
Annotation Mode : Automatic
Annotation language : XML