Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-W 0087
Estonian Treebank Arborest
Arborest is a 2,500 sentence treebank of Estonian which was built in a two-stage process using both Constraint Grammar (CG) and Phrase Structure Grammar (PSG).

The 200,000 word Estonian CG corpus* is a shallow syntactically annotated (and proof-read) corpus of various genres (fiction, newspapers and legal texts). Arborest is the result of its conversion (15% of the whole corpus) to PSG grammar in a semi-automatic way. Structural information has been added in the process of conversion, resulting in VISL-style treebanks.

A larger treebank is under construction (2004-2008).

This is a useful resource for language description (Estonian syntax) and for language technological software evaluation (Information Retrieval, Information Extraction, Machine Translation, etc.).

* Estonian Constraint Grammar Corpus
Applications
application Area : Research
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4