You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-WC0156
Tübingen Treebank of Written German
The TüBa-D/Z treebank contains 45,200 sentences (794,079 tokens) taken from a German newspaper corpus (data based on 'die tageszeitung' from taz). The syntactic annotation was performed manually (four levels of syntactic constituency: the lexical level, the phrasal level, the level of topological fields, and the clausal level).
The treebank is available in three formats: Negra export format, Export-XML format and Penn treebank format.
Annotation of anaphoric and coreference relations to nominal and pronominal antecedents is planned and already available for 36,000 sentences of the corpus (only in XML format).
More releases are planned.
Identification
Period of coverage :
Version :
version 5.9
Version history :
v4 (2008) / v3 (July 2006)
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
German
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Syntactic
Annotation Mode : Manual
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4