You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0087
Estonian Treebank Arborest
Arborest is a 2,500 sentence treebank of Estonian which was built in a two-stage process using both Constraint Grammar (CG) and Phrase Structure Grammar (PSG).
The 200,000 word Estonian CG corpus* is a shallow syntactically annotated (and proof-read) corpus of various genres (fiction, newspapers and legal texts). Arborest is the result of its conversion (15% of the whole corpus) to PSG grammar in a semi-automatic way. Structural information has been added in the process of conversion, resulting in VISL-style treebanks.
A larger treebank is under construction (2004-2008).
This is a useful resource for language description (Estonian syntax) and for language technological software evaluation (Information Retrieval, Information Extraction, Machine Translation, etc.).
* Estonian Constraint Grammar Corpus
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Estonian
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Syntactic
Annotation Mode : Semi automatic
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4