You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0191
AC/DC corpora for Portuguese
This collection of Portuguese (European and Brazilian) contains the following corpora:
- AmostRA-NILC (pos-tagged sample of NILC Corpus): 98,505 words, 4,965 sentences in Brazilian portuguese.
- ANCIB: Brazilian discussion list (moderated) on library science, 866,145 words, 36,049 sentences.
- Avante!: articles from Portuguese party-political newspaper Avante!, 1997-2002, 6,028,310 words, 204,833 sentences in European Portuguese.
- CD HAREM: Golden collection of the First HAREM, 131,308 words, 8,171 sentences in European and Brazilian Portuguese.
- CETEMPúblico: Two-paragraph excerpts from a major Portuguese daily newspaper, PÚBLICO, 1991-1998, 191,687,833 words, 7,082,094 sentences in European Portuguese.
- CETEMPúblico (primeiro milhão): A subset of CETEMPúblico, 912,294 words, 38,251 sentences in European Portuguese.
- CHAVE: Articles from major daily newspapers PÚBLICO and Folha de São Paulo, 1994-1995, 89,902,751 words, 4,742,273 sentences in European and Brazilian Portuguese.
- Clássicos LP/Porto Editora: Portuguese fiction, drama and poetry, 16th and 19th centuries, from Porto Editora, 1,307,334 words, 74,174 sentences in European Portuguese.
- CONDIVport: Articles from sports newspapers from the 1950s, 1970s, and 2000s, from the ConDiv project, 27,221,493 words, 150,562 sentences in European and Brazilian Portuguese.
- CoNE: Spam or general e-mail messages, 720,945 words, 37,981 sentences in European Portuguese.
- DiaCLAV: Articles from four Portuguese regional newspapers, Diário de Coimbra, Diário de Leiria, Diário de Aveiro, Viseu Diário, 6,005,628 words, 210,741 sentences in European Portuguese.
- ECI-EBR: Corpus Borba-Ramsey of Brazilian Portuguese, 648,320 words, 44,689 sentences.
- ECI-EE: Call for the EU ESPRIT program, 24,753 words, 780 sentences in European Portuguese.
- ENPCPUB (parte portuguesa): Translated fiction from English, subset of the ENPC corpus, 66,081 words, 4,369 sentences in European and Brazilian Portuguese.
- FrasesPB: Individual sentences in Brazilian Portuguese, 17,745 words, 651 sentences.
- FrasesPP: Individual sentences in European Portuguese, 14,958 words, 594 sentences.
- Museu da Pessoa: Transcriptions of oral interviews from Museu da Pessoa, 315,420 words, 24,053 sentences in European and Brazilian Portuguese.
- Natura/Minho: Unedited version of articles for Diário do Minho, a regional newspaper in Portugal, 1,593,685 words, 53,185 sentences.
- Natura/Público: Two first paragraphs of each article, PÚBLICO, 1991-1994, 5,726,130 words, 225,734 sentences in European Portuguese.
- NILC/São Carlos: Various texts from the NILC Corpus, newspaper, commercial letters and educational texts, 29,562,995 words, 1,952,829 sentences in Brazilian Portuguese.
- Vercial: Portuguese fiction, 19th century, from the Vercial Project, 8,376,956 words, 383,805 sentences in European Portuguese.
Total: 371,229,589 words, 15,280,783 sentences.
These corpora have been annotated at morphosyntactic level with PALAVRAS (E. Bick, 2000).
The AC/DC project stands for Acesso a corpora/Disponibilização de corpora ("access and availability of corpora").
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Portuguese (Portugal) ; Portuguese (Brazil)
Annotation Granularity : Morpheme
Annotation level : Syntactic
Annotation Mode : Automatic
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4