Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-W 0191
AC/DC corpora for Portuguese
This collection of Portuguese (European and Brazilian) contains the following corpora:

- AmostRA-NILC (pos-tagged sample of NILC Corpus): 98,505 words, 4,965 sentences in Brazilian portuguese.

- ANCIB: Brazilian discussion list (moderated) on library science, 866,145 words, 36,049 sentences.

- Avante!: articles from Portuguese party-political newspaper Avante!, 1997-2002, 6,028,310 words, 204,833 sentences in European Portuguese.

- CD HAREM: Golden collection of the First HAREM, 131,308 words, 8,171 sentences in European and Brazilian Portuguese.

- CETEMPúblico: Two-paragraph excerpts from a major Portuguese daily newspaper, PÚBLICO, 1991-1998, 191,687,833 words, 7,082,094 sentences in European Portuguese.

- CETEMPúblico (primeiro milhão): A subset of CETEMPúblico, 912,294 words, 38,251 sentences in European Portuguese.

- CHAVE: Articles from major daily newspapers PÚBLICO and Folha de São Paulo, 1994-1995, 89,902,751 words, 4,742,273 sentences in European and Brazilian Portuguese.

- Clássicos LP/Porto Editora: Portuguese fiction, drama and poetry, 16th and 19th centuries, from Porto Editora, 1,307,334 words, 74,174 sentences in European Portuguese.

- CONDIVport: Articles from sports newspapers from the 1950s, 1970s, and 2000s, from the ConDiv project, 27,221,493 words, 150,562 sentences in European and Brazilian Portuguese.

- CoNE: Spam or general e-mail messages, 720,945 words, 37,981 sentences in European Portuguese.

- DiaCLAV: Articles from four Portuguese regional newspapers, Diário de Coimbra, Diário de Leiria, Diário de Aveiro, Viseu Diário, 6,005,628 words, 210,741 sentences in European Portuguese.

- ECI-EBR: Corpus Borba-Ramsey of Brazilian Portuguese, 648,320 words, 44,689 sentences.

- ECI-EE: Call for the EU ESPRIT program, 24,753 words, 780 sentences in European Portuguese.

- ENPCPUB (parte portuguesa): Translated fiction from English, subset of the ENPC corpus, 66,081 words, 4,369 sentences in European and Brazilian Portuguese.

- FrasesPB: Individual sentences in Brazilian Portuguese, 17,745 words, 651 sentences.

- FrasesPP: Individual sentences in European Portuguese, 14,958 words, 594 sentences.

- Museu da Pessoa: Transcriptions of oral interviews from Museu da Pessoa, 315,420 words, 24,053 sentences in European and Brazilian Portuguese.

- Natura/Minho: Unedited version of articles for Diário do Minho, a regional newspaper in Portugal, 1,593,685 words, 53,185 sentences.

- Natura/Público: Two first paragraphs of each article, PÚBLICO, 1991-1994, 5,726,130 words, 225,734 sentences in European Portuguese.

- NILC/São Carlos: Various texts from the NILC Corpus, newspaper, commercial letters and educational texts, 29,562,995 words, 1,952,829 sentences in Brazilian Portuguese.

- Vercial: Portuguese fiction, 19th century, from the Vercial Project, 8,376,956 words, 383,805 sentences in European Portuguese.

Total: 371,229,589 words, 15,280,783 sentences.

These corpora have been annotated at morphosyntactic level with PALAVRAS (E. Bick, 2000).

The AC/DC project stands for Acesso a corpora/Disponibilização de corpora ("access and availability of corpora").
Applications
application Area : Research
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4