Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-W 0088
Penn Arabic Treebank
The Penn Arabic Treebank (ATB) is a one million word corpus that has been syntactically annotated in the framework of the DARPA TIDES project.

It contains written Modern Standard Arabic newswire from the Agence France Presse corpus (from July to November 2000) and resorts to the same annotation scheme as the Penn Treebank (constituent structure).

In the last version (ATB3), it contains a total of 401,122 words/tokens after clitics are separated for the treebank annotation.

It is designed to support language research and development of language technology for Modern Standard Arabic (automatic content extraction, information retrieval, information extraction, natural language processing).
Identification
Period of coverage :
Version : v.3 (2008)
Version history : v2.0 (2003)
Production
Project : DARPA TIDES project Creation date : 2003
Applications
application Area : Research
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4