ELRA - ELRA-WC0156 : Tübingen Treebank of Written German

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-WC0156

Tübingen Treebank of Written German

The TüBa-D/Z treebank contains 45,200 sentences (794,079 tokens) taken from a German newspaper corpus (data based on 'die tageszeitung' from taz). The syntactic annotation was performed manually (four levels of syntactic constituency: the lexical level, the phrasal level, the level of topological fields, and the clausal level).

The treebank is available in three formats: Negra export format, Export-XML format and Penn treebank format.

Annotation of anaphoric and coreference relations to nominal and pronominal antecedents is planned and already available for 36,000 sentences of the corpus (only in XML format).

More releases are planned.

Identification

Period of coverage :

Version : version 5.9
Version history : v4 (2008) / v3 (July 2006)

Applications


application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : German
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Syntactic
Annotation Mode : Manual