You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W0401
Rovereto Twitter N-Gram Corpus
The Rovereto Twitter N-Gram Corpus (RTC) is an n-gram dataset of Twitter messages with gender labels of the authors and time of posting. The corpus is based on 75 million English tweets collected from the public stream of Twitter, between December 2010 and July 2011. Instead of full text content of tweets, frequency statistics of n-grams are provided. For each n-gram, the frequencies are broken down by gender of the authors and posting time (i.e., day of the week and hour of the day).
Production
Creation date :
2011
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
English
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4