You are here
»
Universal Catalogue
»
Spoken Resources
»
Broadcast Resources
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-WC329
TagShare Corpus
This is a Portuguese corpus of one million tokens. 1/3 of the total corpus corresponds to transcribed spoken materials. The Spoken data range from formal to informal registers and contain phone calls, media broadcasts, monologues, etc. The written texts are newspapers, books, magazines, journals and miscellaneous. The sentences and the paragraphs are segmented and each token is circumscribed by blanks. The corpus is also part-of-speech tagged, contains inflectional information, and each nominal and verbal token is associated to its lemma.
The last version of this corpus is the CINTIL Corpus (International Corpus of Portuguese). It is now distributed through the ELRA catalogue (
http://catalog.elra.info
) under the reference W0050.
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Portuguese
speech corpus
Language(s) :
Portuguese
Source Channel :
Microphone#Radio#Telephone#Television
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4