You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0095
Szeged Corpus for Hungarian
The Szeged Corpus is a morpho-syntactically annotated and POS-tagged Hungarian natural language database. It contains 1,2 million words from texts of various genres: fiction, short essays of 14 to 16 year-old students, newspaper articles, texts related to computer science, legal texts, economic and financial news.
The corpus was tagged using the Morpho-Syntactic Description tagging system and then manually disambiguated by linguists.
Corpus files are available in XML-format (compliant with the TEIxLite DTD scheme).
Identification
Period of coverage :
Version :
Version history :
v1.0: 2002
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Hungarian (Hungary)
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Morphological
Annotation Mode : Manual
Annotation Scheme : TEI
Annotation language : XML
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4