ELRA - ELRA-U-W 0003 : The Parsed Corpus of Early English Correspondence

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W 0003

The Parsed Corpus of Early English Correspondence

The PCEEC is a POS tagged and syntactically annotated version of the Corpus of Early English Correspondence, which was compiled for research in sociolinguistics. It contains 2,200,000 words and is composed of English letters from 1410 to 1681. It was published in 2006.

The annotation scheme is the same as that used in the PPCME2 and the PPCEME. Three types of files have been produced: text files (.txt), POS files (.pos), parsed files (.psd).

The parsed files are designed to be searched with SearchCorpus (1.1 or 2), an open source tool which can be freely downloaded from Sourceforge.

Note that there is a small overlap with the PPCEME.

Identification

Period of coverage : from 1410 to 1681

Version : Parsed version of the CEEC
Version history :

Production

Project : Sociolinguistics and Language History

Creation date : 2006

Applications

	Applications possible : Discourse analysis
application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : English (United Kingdom)
Document source : Scanned
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Syntactic