Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-WC0156
Tübingen Treebank of Written German
The TüBa-D/Z treebank contains 45,200 sentences (794,079 tokens) taken from a German newspaper corpus (data based on 'die tageszeitung' from taz). The syntactic annotation was performed manually (four levels of syntactic constituency: the lexical level, the phrasal level, the level of topological fields, and the clausal level).

The treebank is available in three formats: Negra export format, Export-XML format and Penn treebank format.

Annotation of anaphoric and coreference relations to nominal and pronominal antecedents is planned and already available for 36,000 sentences of the corpus (only in XML format).

More releases are planned.
Identification
Period of coverage :
Version : version 5.9
Version history : v4 (2008) / v3 (July 2006)
Applications
application Area : Research
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4