You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0088
Penn Arabic Treebank
The Penn Arabic Treebank (ATB) is a one million word corpus that has been syntactically annotated in the framework of the DARPA TIDES project.
It contains written Modern Standard Arabic newswire from the Agence France Presse corpus (from July to November 2000) and resorts to the same annotation scheme as the Penn Treebank (constituent structure).
In the last version (ATB3), it contains a total of 401,122 words/tokens after clitics are separated for the treebank annotation.
It is designed to support language research and development of language technology for Modern Standard Arabic (automatic content extraction, information retrieval, information extraction, natural language processing).
Identification
Period of coverage :
Version :
v.3 (2008)
Version history :
v2.0 (2003)
Production
Project :
DARPA TIDES project
Creation date :
2003
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Modern Standard Arabic
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Syntactic
Annotation language : XML
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4