You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0265
Brazilian Portuguese-English Parallel Corpora
It is a bilingual Brazilian Portuguese-English corpora of parallel texts from different domains: scientific, law and journalistic. It contains the following sub-corpora:
- CorpusPE: 65 pairs of academic parallel texts (abstracts) on Computer Science. They are included in two verions: one authentic (non-revised) of 21,432 words, and another revised by a human translator (pre-edited corpus) of 21,492 words.
- CorpusALCA: 4 pairs of parallel official documents of the Free Trade Area of the Americas (FTAA). It contains 22,069 words.
- CorpusNYT: 7 pairs of parallel articles from "The New York Times". It contains 10,595 words.
These corpora were divided in three classes of corpora: test corpora, POS-tagged corpora and reference corpora.
It was developed to support the PESA project, which aims to investigate, implement and evaluate some sentence alignment methods of Brazilian Portuguese and English parallel texts.
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Bilingual
Language(s) :
Portuguese (Brazil)English
Alignment :
Sentence
Number of tokens :
75,000
Annotation Coverage : Partial
Annotation Granularity : Word
Annotation level : Morphological
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4