You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W0298
WaCkypedia English corpus
The WaCkypedia English corpus (WaCkypedia_EN) is a copy of the English Wikipedia's full content at the date of 2009. It represents about 800 million tokens and is POS-tagged, lemmatized, and fully parsed with a dependency parser.
This is the same annotation scheme as in the PukWaC (see U-W0297).
Production
Creation date :
2009
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
English (United Kingdom)
Document source :
Internet
Number of tokens :
800 million tokens
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Syntactic
Annotation Mode : Automatic
Annotation language : XML
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4