Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-WC0213
Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME)
The Penn-Helsinki Parsed Corpus of Middle English, second edition (PPCME2), the Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME) and the Penn Parsed Corpus of Modern British English are syntactically annotated corpora of prose text samples of English from the indicated time periods. Their syntactic annotation (parsing) permits searching, not only for words and word sequences, but also for syntactic structure. The corpora are designed for the use of students and scholars of the history of English, especially the historical syntax of the language.
The Penn-Helsinki Parsed Corpus of Early Modern English consists of nearly 1.8 million words. Each of the texts in the corpus is in parsed, POS-tagged, and unannotated form. In addition, the corpus is divided into three subcorpora:
1. The Helsinki directories, consisting of roughly 573,000 words, contain the Helsinki Corpus in parsed, POS-tagged, and unannotated form.
2. The Penn1 directories, consisting of roughly 615,000 words, contain a first supplement to the Helsinki Corpus. As far as possible, we have used material by the same authors and from the same editions as the material in the Helsinki Corpus. Where necessary (where the Helsinki Corpus contains an exhaustive sample of a text), we have added new material as summarized below.
3. The Penn2 directories, consisting of roughly 606,000 words, contain a second supplement to the Helsinki Corpus. Again, we have tried to use material by the same authors and from the same editions as the material in the Helsinki Corpus. However, the Penn2 directories contain more new material than the Penn1 directories.
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4