You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0010
CETENFolha corpus
CETENFolha stands for Corpus de Extractos de Textos Electrónicos NILC/Folha de São Paulo. It is a 24 million word corpus of Brasilian Portuguese which was created during the Computational Processing of Portuguese project. Texts were extracted from the daily newspaper Folha de S. Paulo (year 1994).
It has been designed as a counterpart of the CETEMPúblico for Brasilian Portuguese.
The aim of the project was to compile resources for research and development purposes in natural language processing of Brasilian Portuguese.
The corpus has also been annotated in 2003 with PALAVRAS parser (Eckhard Bick).
Note that the data has been included in the Chave collection.
Identification
Period of coverage :
Version :
v0.1 (2002)
Version history :
Production
Creation date :
2002
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Portuguese (Brazil)
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4