Universal Catalogue

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Written Corpora

Displaying 521 to 540 (of 730 products)

Result Pages: [<< Prev] ... 26 27 28 29 30 ... [Next >>]

ELRA-WC0227

IBM Manuals Treebank

This is a 800,000-word skeleton-parsed corpus of computer manuals.
Language(s) : English

Click here for
more information

ELRA-WC0228

Lancaster Anaphoric Treebank

A subsample of the Associated Press Corpus, containing American newswire reports, annotated to show the reference of pronouns and lexical cohesion. It contains approximately 100,000 words.
Language(s) : English (USA)

Click here for
more information

ELRA-WC0229

Longman-Lancaster Corpus

It consists of 30 million words of written English taken from literature, magazines, papers and more ephemeral materials such as leaflets and packaging.
Language(s) : English

Click here for
more information

ELRA-WC0230

ET10-63 Corpus

It consists of approximately 1,250,000 words of each language and contains EC offical documents on telecommunications. The corpus is part-of-speech tagged and lemmatized.
Language(s) : English - French

Click here for
more information

ELRA-WC0231

CRATER Corpus

(Available since 23/01/1997)

This is an 1,000,000-word trilingual corpus aligned at the sentence level. It is made up of texts from the telecommunications domain. It has been part-of-speech tagged in all three languages.
Language(s) : English - French - Spanish

Click here for
more information

ELRA-WC0232

The Helsinki Corpus of English Texts: Diachronic Part

It contains samples from texts covering the Old, Middle, and Early Modern English periods. It consists of 1,500,000 words in total.
Language(s) : English

Click here for
more information

ELRA-WC0233

Lampeter Corpus of Early Modern English Tracts

It consists of approximately one million words of English pamphlet literature covering the years 1640-1740. It is being tagged for part-of-speech and lemmatized.
Language(s) : English

Click here for
more information

ELRA-WC0234

Corpus BAF

It consists of approximately 400,000 words for each language from literary, technical, scientifc and instutitional domains.
Language(s) : English - French

Click here for
more information

ELRA-WC0235

Corpus of transcribed Bulgarian conversations

It consists of transcribed conversations in family contexts.
Language(s) : Bulgarian

Click here for
more information

ELRA-WC0236

Transcripts of Bulgarian Parliament debates

It consists of transcripted recordings of broadcasts from the debates of the 7th Great National Assembly on 31 October, 1990.
Language(s) : Bulgarian

Click here for
more information

ELRA-WC0237

Orchid Corpus

This Part-of-Speech tagged corpus is a Thai text corpus with syntactic word class annotation. It contains approximately 2,560,000 words.
Language(s) : Thai

Click here for
more information

ELRA-WC0238

Annotated Bulgarian Texts

It contains three texts annotated with prevalently lexical notes.
Language(s) : Bulgarian

Click here for
more information

ELRA-WC0239

Bulgarian Poetry Archive

This electronic literary text archive contains 301 poems in HTML format.
Language(s) : Bulgarian

Click here for
more information

ELRA-WC0240

Bulgarian Text Corpus (TRACTOR)

It consists of approximately 275 000 words and represents text genres such as news, legal and poetry, encoded with SGML according to the Corpus Encoding Standard (CES).
Language(s) : Bulgarian

Click here for
more information

ELRA-WC0241

Corpus Textual Informatitzat de la Llengua Catalana (CTILC)

It contains various kinds of texts, dating from 1833 to 1988: literary (theatre, poetry…), and no literary texts (legal documents, scientific articles, etc.).
Language(s) : Catalan

Click here for
more information

ELRA-WC0242

Chinese Philosophical E-text Archive

It consists of chinese e-texts ranging from the classical pre-Qin and Song to the Qing and modern ones: electronic versions of Chinese philosophical texts created by the Confucian Etext Project, electronic versions of Chinese philosophical texts from other sources and information on and links to more information on the preparation and use of these texts.
Language(s) : English

Click here for
more information

ELRA-WC0243

Penn Chinese Treebank

This is a corpus of Chinese text segmented into words and annotated with part-of-speech labels and syntactic bracketing, modeled on the English TreeBank. It contains 500 thousand words (over 824K Chinese characters).
Language(s) : English

Click here for
more information

ELRA-WC0244

Chinese Proposition Bank

The goal of the Penn Chinese Proposition Bank project is to create a corpus of text annotated with information about basic semantic propositions. Predicate-argument relations are being added to the syntactic trees of the Penn Chinese Treebank.
Language(s) : English

Click here for
more information

ELRA-WC0245

Chiricahua and Mescalero Apache Texts

It contains 56 Apache texts and their English translation, collected in the summers of 1930 and 1931, and in the spring of 1934.
Language(s) : English

Click here for
more information

ELRA-WC0246

Trilingual Parallel Computer Corpus

This corpus has been compiled in order to study the syntactic and semantic structure of "up/on" and its equivalents in French ("sur") and Dutch ("op"). It consists of 2 million words and it is divided into two subcorpora (fiction and non-fiction).
Language(s) : Dutch - English - French

Click here for
more information

Displaying 521 to 540 (of 730 products)

Result Pages: [<< Prev] ... 26 27 28 29 30 ... [Next >>]