Universal Catalogue  
  You are here » Universal Catalogue » Spoken Resources » Desktop/microphone
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-WC329
TagShare Corpus
This is a Portuguese corpus of one million tokens. 1/3 of the total corpus corresponds to transcribed spoken materials. The Spoken data range from formal to informal registers and contain phone calls, media broadcasts, monologues, etc. The written texts are newspapers, books, magazines, journals and miscellaneous. The sentences and the paragraphs are segmented and each token is circumscribed by blanks. The corpus is also part-of-speech tagged, contains inflectional information, and each nominal and verbal token is associated to its lemma.

The last version of this corpus is the CINTIL Corpus (International Corpus of Portuguese). It is now distributed through the ELRA catalogue (http://catalog.elra.info) under the reference W0050.
Contents Click on the arrow to display content.
 written corpus 
 speech corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4