You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0203
Portuguese English Comparable Corpus
Comp_C is a Portuguese-English comparable corpus (300,000 words in each language). It was developed in the Lacio-Web Project with the following other corpora:
- Lacio-Ref, a reference corpus of newspaper articles in Brazilian Portuguese.
- Mac-Morpho, a 1,1 million word gold standard corpus (portion of the Lacio-Ref) which is morpho-syntactically annotated (PALAVRAS, E. Bick) and manually validated.
- A part of the Lacio-Ref automatically annotated with lemmas, POS and syntactic tags (Curupira parser).
- The Lacio-Dev, a deviation corpus composed of non-revised texts (516,840 tokens).
- Par-C, a Portuguese-English parallel corpus.
Production
Project :
Lacio Web Project
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Bilingual
Language(s) :
Portuguese (Brazil)
Alignment :
Comparable
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4