You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0201
Lacio-Dev Corpus
The Lacio-Dev is a deviation corpus composed of non-revised texts (516,840 tokens). It was developed in the Lacio-Web Project with the following other corpora:
- Lacio-Ref, a reference corpus of newspaper articles in Brazilian Portuguese.
- Mac-Morpho, a 1,1 million word gold standard corpus (portion of the Lacio-Ref) which is morpho-syntactically annotated (PALAVRAS, E. Bick) and manually validated.
- A part of the Lacio-Ref automatically annotated with lemmas, POS and syntactic tags (Curupira parser).
- Par-C, a Portuguese-English parallel corpus.
- Comp_C, a Portuguese-English comparable corpus (300,000 words in each language).
Production
Project :
Lacio-Web Project
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Portuguese (Brazil)
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4