Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 401 to 420 (of 730 products) Result Pages: [<< Prev]  ... 21  22  23  24  25 ...  [Next >>] 

ELRA-U-W0394
AQUAINT TimeML Corpus 


The AQUAINT TimeBank contains 73 news report documents (31,000 tokens) annotated following the TimeML 1.2.1 specification, a language for the annotation and normalization of temporal information.
Language(s) : English

Click here for
more information


ELRA-U-W0395
Danish-English Parallel Dependency Treebank 


This is a parallel dependency treebank of 95,000 words. It consists of the English translation of the Danish Dependency Treebank aligned at the word-level.
Language(s) : Danish >>>> English

Click here for
more information


ELRA-U-W0396
Manipuri-English parallel corpus 


This is a corpus of 10,350 parallel sentences collected from comparable news corpora.
Language(s) : English <<< >>> other

Click here for
more information


ELRA-U-W0397
ComTrans corpus 


It consists of word-aligned corpora in German / English, German / French and English / French).
Language(s) : German <<< >>> English - German <<< >>> French - English <<< >>> French

Click here for
more information


ELRA-U-W0398
20 Newsgroups data set 


This is a collection of about 20,000 newsgroup documents, partitioned evenly across 20 different newsgroups. It contains 15-year-old Usenet messages.
Language(s) : English

Click here for
more information


ELRA-U-W0399
Luxembourgish Email Word Corpus (LEWC)


This corpus contains 296 private email bodies, resulting in 31,469 tokens.
Language(s) : Luxembourgish, Letzeburgesch

Click here for
more information


ELRA-U-W0400
COP15 


This is a Twitter data set designed to comprise a complete set of tweets for a specific news driven vent. COP15 refers to the The 2009 United Nations Climate Change Conference that took place in Copenhagen, Denmark, between December 7 and December 18. The conference included the 15th Conference of the Parties (COP 15) to the United Nations Framework Convention on Climate Change.

A total of 207,782 tweets were downloaded during the month of December 2009 by querying the Twitter Search API with the term 'cop15'.
Language(s) : English

Click here for
more information


ELRA-U-W0401
Rovereto Twitter N-Gram Corpus (RTC)


The Rovereto Twitter N-Gram Corpus (RTC) is an n-gram dataset of Twitter messages with gender labels of the authors and time of posting. The corpus is based on 75 million English tweets collected from the public stream of Twitter, between December 2010 and July 2011.
Language(s) : English

Click here for
more information


ELRA-WC0100
Thai words 1 


It consists of 18,057 words.
Language(s) : Thai

Click here for
more information


ELRA-WC0101
Thai words 2 


It consists of 16,384 words.
Language(s) : Thai

Click here for
more information


ELRA-WC0102
TOLL (Thai Online Library) 


These bilingual texts were usefull for Thai students of English, and for foreign students of Thai.

This resource is not accessible anymore.
Language(s) : Thai - English

Click here for
more information


ELRA-WC0103
Wizard of OZ 


Language(s) : Thai

Click here for
more information


ELRA-WC0105
EMAS corpus 


It consists of 25,000 sentences from 859 students.
Language(s) : English

Click here for
more information


ELRA-WC0106
Computerised corpus of Malaysian English 


Language(s) : English

Click here for
more information


ELRA-WC0107
Malay Concordance project 


It contains 1,7 million words.
Language(s) : Malay

Click here for
more information


ELRA-WC0108
ATMA corpus 


It contains 37,589 verses.
Language(s) : Malay

Click here for
more information


ELRA-WC0109
MACLE Malaysian corpus of learner English 


Language(s) : English

Click here for
more information


ELRA-WC0110
WT Malay corpus 


Language(s) : Malay

Click here for
more information


ELRA-WC0112
HNC (Hellenic National Corpus) 


This is a corpus of written Modern Greek texts consisting of about 20 million words of written texts from several media (books, periodicals, newspapers etc.), which belong to different genres (articles, essays, literary works, reports, biographies etc.) and various topics (economy, medicine, leisure, art, human sciences etc.).
Language(s) : Greek

Click here for
more information


ELRA-WC0113
DIKAIO corpus 


Language(s) : Greek

Click here for
more information


Displaying 401 to 420 (of 730 products) Result Pages: [<< Prev]  ... 21  22  23  24  25 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4