ELRA - ELRA-U-W0298 : WaCkypedia English corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W0298

WaCkypedia English corpus

The WaCkypedia English corpus (WaCkypedia_EN) is a copy of the English Wikipedia's full content at the date of 2009. It represents about 800 million tokens and is POS-tagged, lemmatized, and fully parsed with a dependency parser.

This is the same annotation scheme as in the PukWaC (see U-W0297).

Production

Creation date : 2009

Applications


application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : English (United Kingdom)
Document source : Internet
Number of tokens : 800 million tokens
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Syntactic
Annotation Mode : Automatic
Annotation language : XML