ELRA - ELRA-U-W0355 : The FidaPLUS corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W0355

The FidaPLUS corpus

This is an extensive collection of texts published between 1990 and 2006, which represents a balanced sample of texts in Slovenian. The FidaPLUS corpus extends the FIDA corpus to 600 million words.

It consists mostly of Slovenian daily newspapers (65,26%), various magazines (23,26%) and books (8,74%). It also includes texts from the Internet (1,24%), and various other texts such as transcripts of parliamentary speeches, advertisements, bills, tickets, etc (1,5%).

In addition to lexical tags provided in the FIDA corpus (see U-W 0033), it also indicates automatically assigned context disambiguated MSDs and lemmas.

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : Slovenian (Slovenia)
Number of tokens : 600 million words
Annotation level : Morphological
Lexical Unit Information : Single word lemma