ELRA - ELRA-U-W 0095 : Szeged Corpus for Hungarian

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W 0095

Szeged Corpus for Hungarian

The Szeged Corpus is a morpho-syntactically annotated and POS-tagged Hungarian natural language database. It contains 1,2 million words from texts of various genres: fiction, short essays of 14 to 16 year-old students, newspaper articles, texts related to computer science, legal texts, economic and financial news.

The corpus was tagged using the Morpho-Syntactic Description tagging system and then manually disambiguated by linguists.
Corpus files are available in XML-format (compliant with the TEIxLite DTD scheme).

Identification

Period of coverage :

Version :
Version history : v1.0: 2002

Applications


application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : Hungarian (Hungary)
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Morphological
Annotation Mode : Manual
Annotation Scheme : TEI
Annotation language : XML