You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W0355
The FidaPLUS corpus
This is an extensive collection of texts published between 1990 and 2006, which represents a balanced sample of texts in Slovenian. The FidaPLUS corpus extends the FIDA corpus to 600 million words.
It consists mostly of Slovenian daily newspapers (65,26%), various magazines (23,26%) and books (8,74%). It also includes texts from the Internet (1,24%), and various other texts such as transcripts of parliamentary speeches, advertisements, bills, tickets, etc (1,5%).
In addition to lexical tags provided in the FIDA corpus (see U-W 0033), it also indicates automatically assigned context disambiguated MSDs and lemmas.
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Slovenian (Slovenia)
Number of tokens :
600 million words
Annotation level : Morphological
Lexical Unit Information : Single word lemma
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4