You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0079
Penn Treebank
The Penn Treebank is a bank of linguistic trees for English. The data comes from several well-known corpora: Wall Street Journal, the Brown Corpus, Switchboard and ATIS (more than one million words). The corpus contains annotations showing rough syntactic and semantic information. The theoretical background underlying the analysis of sentences is the constituent structure theory.
Texts are POS tagged and transcripts of spoken data (Switchboard) are annotated for disfluency, tagged and parsed.
The data contained in Release III is annotated in Treebank II style and contains a manual for Treebank II bracketing and the part-of-speech tagging guidelines. Tools for processing Treebank data is made available, as well as the contents of the previous version (Version 0.5). Release III provides as a new material the Brown parsed text. The Penn Treebank III is available through LDC
http://www.ldc.upenn.edu/Catalog/
Identification
Period of coverage :
Version :
release III 1999
Version history :
release II 1995
Production
Creation date :
1995
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
English (USA)
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Syntactic
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4