You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0058
The IPI-PAN Corpus
The IPI-PAN corpus is a Polish written corpus of more than 250 million segments. Various genres are represented (in unbalanced proportions): contemporary prose, older prose, science, newspapers, parliamentary proceedings, law.
The corpus is morphosyntactically annotated (with a notion which is theoretically closer to flexemes than to POS).
Format and encoding: xml files, conform to the XCES standards.
During the project, tools for searching the corpus were also created.
Identification
Period of coverage :
Version :
Sd edition (2006)
Version history :
First edition : June 2004.
Production
Creation date :
2006
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Polish (Poland)
Character set :
utf8
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Morphological
Annotation Scheme : TEI
Annotation language : XML
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4