ELRA - ELRA-U-W 0010 : CETENFolha corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W 0010

CETENFolha corpus

CETENFolha stands for Corpus de Extractos de Textos Electrónicos NILC/Folha de São Paulo. It is a 24 million word corpus of Brasilian Portuguese which was created during the Computational Processing of Portuguese project. Texts were extracted from the daily newspaper Folha de S. Paulo (year 1994).
It has been designed as a counterpart of the CETEMPúblico for Brasilian Portuguese.

The aim of the project was to compile resources for research and development purposes in natural language processing of Brasilian Portuguese.

The corpus has also been annotated in 2003 with PALAVRAS parser (Eckhard Bick).

Note that the data has been included in the Chave collection.

Identification

Period of coverage :

Version : v0.1 (2002)
Version history :

Production

Creation date : 2002

Applications


application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : Portuguese (Brazil)