Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-W0401
Rovereto Twitter N-Gram Corpus
The Rovereto Twitter N-Gram Corpus (RTC) is an n-gram dataset of Twitter messages with gender labels of the authors and time of posting. The corpus is based on 75 million English tweets collected from the public stream of Twitter, between December 2010 and July 2011. Instead of full text content of tweets, frequency statistics of n-grams are provided. For each n-gram, the frequencies are broken down by gender of the authors and posting time (i.e., day of the week and hour of the day).
Production
Creation date : 2011
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4