Universal Catalogue

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Written Corpora

Displaying 401 to 420 (of 730 products)

Result Pages: [<< Prev] ... 21 22 23 24 25 ... [Next >>]

ELRA-U-W0394

AQUAINT TimeML Corpus

The AQUAINT TimeBank contains 73 news report documents (31,000 tokens) annotated following the TimeML 1.2.1 specification, a language for the annotation and normalization of temporal information.
Language(s) : English

Click here for
more information

ELRA-U-W0395

Danish-English Parallel Dependency Treebank

This is a parallel dependency treebank of 95,000 words. It consists of the English translation of the Danish Dependency Treebank aligned at the word-level.
Language(s) : Danish >>>> English

Click here for
more information

ELRA-U-W0396

Manipuri-English parallel corpus

This is a corpus of 10,350 parallel sentences collected from comparable news corpora.
Language(s) : English <<< >>> other

Click here for
more information

ELRA-U-W0397

ComTrans corpus

It consists of word-aligned corpora in German / English, German / French and English / French).
Language(s) : German <<< >>> English - German <<< >>> French - English <<< >>> French

Click here for
more information

ELRA-U-W0398

20 Newsgroups data set

This is a collection of about 20,000 newsgroup documents, partitioned evenly across 20 different newsgroups. It contains 15-year-old Usenet messages.
Language(s) : English

Click here for
more information

ELRA-U-W0399

Luxembourgish Email Word Corpus (LEWC)

This corpus contains 296 private email bodies, resulting in 31,469 tokens.
Language(s) : Luxembourgish, Letzeburgesch

Click here for
more information

ELRA-U-W0400

COP15

This is a Twitter data set designed to comprise a complete set of tweets for a specific news driven vent. COP15 refers to the The 2009 United Nations Climate Change Conference that took place in Copenhagen, Denmark, between December 7 and December 18. The conference included the 15th Conference of the Parties (COP 15) to the United Nations Framework Convention on Climate Change.

A total of 207,782 tweets were downloaded during the month of December 2009 by querying the Twitter Search API with the term 'cop15'.
Language(s) : English

Click here for
more information

ELRA-U-W0401

Rovereto Twitter N-Gram Corpus (RTC)

The Rovereto Twitter N-Gram Corpus (RTC) is an n-gram dataset of Twitter messages with gender labels of the authors and time of posting. The corpus is based on 75 million English tweets collected from the public stream of Twitter, between December 2010 and July 2011.
Language(s) : English

Click here for
more information

ELRA-WC0100

Thai words 1

It consists of 18,057 words.
Language(s) : Thai

Click here for
more information

ELRA-WC0101

Thai words 2

It consists of 16,384 words.
Language(s) : Thai

Click here for
more information

ELRA-WC0102

TOLL (Thai Online Library)

These bilingual texts were usefull for Thai students of English, and for foreign students of Thai.

This resource is not accessible anymore.
Language(s) : Thai - English

Click here for
more information

ELRA-WC0103

Wizard of OZ

Language(s) : Thai

Click here for
more information

ELRA-WC0105

EMAS corpus

It consists of 25,000 sentences from 859 students.
Language(s) : English

Click here for
more information

ELRA-WC0106

Computerised corpus of Malaysian English

Language(s) : English

Click here for
more information

ELRA-WC0107

Malay Concordance project

It contains 1,7 million words.
Language(s) : Malay

Click here for
more information

ELRA-WC0108

ATMA corpus

It contains 37,589 verses.
Language(s) : Malay

Click here for
more information

ELRA-WC0109

MACLE Malaysian corpus of learner English

Language(s) : English

Click here for
more information

ELRA-WC0110

WT Malay corpus

Language(s) : Malay

Click here for
more information

ELRA-WC0112

HNC (Hellenic National Corpus)

This is a corpus of written Modern Greek texts consisting of about 20 million words of written texts from several media (books, periodicals, newspapers etc.), which belong to different genres (articles, essays, literary works, reports, biographies etc.) and various topics (economy, medicine, leisure, art, human sciences etc.).
Language(s) : Greek

Click here for
more information

ELRA-WC0113

DIKAIO corpus

Language(s) : Greek

Click here for
more information

Displaying 401 to 420 (of 730 products)

Result Pages: [<< Prev] ... 21 22 23 24 25 ... [Next >>]