Universal Catalogue

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Written Corpora

Displaying 501 to 520 (of 730 products)

Result Pages: [<< Prev] ... 26 27 28 29 30 ... [Next >>]

ELRA-WC0207

English-Norwegian Parallel Corpus (ENPC)

The corpus consists of text excerpts of approximately 10.000 to 15.000 words from fictional and non-fictional Norwegian and English original texts and their translations, amounting to a total of 200 texts, or 2.6 million words.
Language(s) : English - Norwegian

Click here for
more information

ELRA-WC0208

Oslo Multilingual Corpus (OMC)

This is an extension of the 2,6 million-word English-Norwegian Parallel Corpus (ENPC). German, Dutch and Portugese translations were added for some of the texts. It contains fictional and non-fictional texts.
Language(s) : Dutch - English - German - Norwegian - Portuguese

Click here for
more information

ELRA-WC0209

Oslo Corpus of Bosnian Texts

It consists of approximately 1.5 million words and comprises several different genres: fiction (novels and short stories), essays, children's stories, folklore, islamic texts, legal texts, and newspapers and journals.
Language(s) : Bosnian

Click here for
more information

ELRA-WC0210

Tycho Brahe Parsed Corpus of Historical Portuguese

This electronic annotated corpus consists of texts written between 1500-1900.
Language(s) : Portuguese

Click here for
more information

ELRA-WC0211

Susanne Corpus

It contains annotations of a 130,000-word cross-section of written American English
Language(s) : English (USA)

Click here for
more information

ELRA-WC0212

Penn-Helsinki Parsed Corpus of Middle English (PPCME2)

It includes a total of roughly 1.2 million words of running text. It comprises 55 text samples, each of which is given in three forms: a text file, a part-of-speech tagged file and a parsed file. In addition, there is a file with philological and bibliographical information about each text.
Language(s) : English

Click here for
more information

ELRA-WC0213

Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME)

It consists of nearly 1.8 million words. Each of the texts in the corpus is available in parsed, POS-tagged, and unannotated form. The corpus is divided into three subcorpora : the Helsinki directories (roughly 573,000 words), the Penn1 directories (roughly 615,000 words) and the Penn2 directories (roughly 606,000 words).
Language(s) : Modern English

Click here for
more information

ELRA-WC0214

York-Helsinki Parsed Corpus of Old English Poetry

It contains 71,490 words of Old English poetic texts, that are syntactically and morphologically annotated.
Language(s) : Old English

Click here for
more information

ELRA-WC0215

York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE)

This is a 1.5 million word syntactically-annotated corpus of Old English prose texts.
Language(s) : Old English

Click here for
more information

ELRA-WC0216

Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English

It contains 106,210 words of Old English texts that are syntactically and morphologically annotated.
Language(s) : Old English

Click here for
more information

ELRA-WC0217

Lancaster/Oslo-Bergen Corpus (LOB)

It contains approximately one million words of British written English dating from 1960 and consisting of 15 different genre categories.
Language(s) : English (United Kingdom)

Click here for
more information

ELRA-WC0218

Lancaster-Leeds Treebank

This is a manually parsed subsample of the LOB corpus showing the surface phrase structure of each sentence. It consists of approximately 45,000 words taken from all the genre categories of the LOB corpus.
Language(s) : English

Click here for
more information

ELRA-WC0219

Brown Corpus of Standard American English

It consists of one million words of 500 American English texts printed in 1961, each consisting of 2,000 words.
Language(s) : English (USA)

Click here for
more information

ELRA-WC0220

Corpus des Oeuvres de Philosophie en Langue Française

It contains 229 works (including 651 images).
Language(s) : French

Click here for
more information

ELRA-WC0221

Miscellaneous French Texts

It consists of 38 titles of french texts.
Language(s) : French

Click here for
more information

ELRA-WC0222

Corpus Médical de la Faculté de Médecine de Grenoble

It contains 284 questions related to medical pathologies of 31 subjects.
Language(s) : French

Click here for
more information

ELRA-WC0223

Lancaster Parsed Corpus (LPC)

This is a subsample of the LOB corpus, parsed by computer and manually corrected by several researchers. It contains approximately 140,000 words with samples from each of the 15 categories in the LOB corpus.
Language(s) : English (United Kingdom)

Click here for
more information

ELRA-WC0224

American Printing House for the Blind Treebank (APHB)

This is a 200,000-word skeleton-parsed corpus of a wide range of English texts.
Language(s) : English (USA)

Click here for
more information

ELRA-WC0225

Associated Press Treebank (AP)

This is a skeleton-parsed corpus of American newswire reports containing 1,000,000 words.
Language(s) : English (USA)

Click here for
more information

ELRA-WC0226

Canadian Hansard Treebank

This is a 750,000-word skeleton-parsed corpus of proceedings in the Canadian Parliament.
Language(s) : English (Canada) - French (Canada)

Click here for
more information

Displaying 501 to 520 (of 730 products)

Result Pages: [<< Prev] ... 26 27 28 29 30 ... [Next >>]