Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-W 0132
Latvian Corpus of Written Texts
This corpus of written Latvian contains approximately 20 million running words from different types of texts:
- Latvian classical literature (from the end of the 19th c / beginning of the 20th c),
- Latvian folklore and culture,
- Newspapers ("Rďgas Balss" 1994-1997).

It is being turned step by step into XML. Work is also going on to annotate the corpus with morpho-syntactic information in a semi-automatic way. In this perspective, a pilot morpho-syntactically annotated corpus was built in 2001. It contains 10,000 words of modern written Latvian from this corpus and was manually annotated.

In the framework of the ESF project competition, the corpus is also being enlarged and balanced.

The building of this large structured and annotated corpus is a basis for the development and improvement of tools for Latvian language processing, which in return makes it possible to enlarge and enrich the corpus more rapidly and efficiently.
Production
Creation date : 2000
Applications
application Area : Research
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4