Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-W 0010
CETENFolha corpus
CETENFolha stands for Corpus de Extractos de Textos Electrónicos NILC/Folha de São Paulo. It is a 24 million word corpus of Brasilian Portuguese which was created during the Computational Processing of Portuguese project. Texts were extracted from the daily newspaper Folha de S. Paulo (year 1994).
It has been designed as a counterpart of the CETEMPúblico for Brasilian Portuguese.

The aim of the project was to compile resources for research and development purposes in natural language processing of Brasilian Portuguese.

The corpus has also been annotated in 2003 with PALAVRAS parser (Eckhard Bick).

Note that the data has been included in the Chave collection.
Identification
Period of coverage :
Version : v0.1 (2002)
Version history :
Production
Creation date : 2002
Applications
application Area : Research
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4