ELRA - ELRA-U-W 0009 : CETEMPúblico corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W 0009

CETEMPúblico corpus

CETEMPúblico stands for Corpus de Extractos de Textos Electrónicos MCT/Público. It is a 180 million word corpus of Portuguese which was built during the project Computacional Processing of Portuguese.
Texts were extracted from editions of the PÚBLICO, a daily Portuguese newspaper, published between 1991 and 1998.

It was compiled for research and development purposes in natural language processing of Portuguese.

CETEMPúblico has also been annotated with the PALAVRAS parser in Eckhard Bick's VISL project.

Identification

Period of coverage : 1991 to 1998

Version : v1.7
Version history : v1.0 (2000)

Production

Creation date : 2001

Applications


application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : Portuguese (Portugal)