Universal Catalogue

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Written Corpora

Displaying 481 to 500 (of 730 products)

Result Pages: [<< Prev] ... 21 22 23 24 25 ... [Next >>]

ELRA-WC0183

Louvain Corpus of Native English Essays (LOCNESS)

LOCNESS is a corpus of native English essays containg 324,304 words : British pupils' A level essays (60,209 words), British university students essays (95,695 words) and American university students' essays (168,400 words).
Language(s) : English

Click here for
more information

ELRA-WC0184

CLUVI Parallel Corpora

It contains over 23 million words in five language combinations related to Galician: English-Galician, Galician-Spanish, French-Galician, English-Galician-French-Spanish and Spanish-Galician-Catalan-Basque.

The parallel texts are aligned in an XML-adaptation of the TMX format (Translation Memory eXchange).
Language(s) : Galician - Spanish - English - French - Portuguese - Basque - Catalan

Click here for
more information

ELRA-WC0186

Web Pages Corpus

This is a corpus of web pages email messages (440 documents), where each document is provided with one of the four category labels: conferences, jobs, resources and trash.
Language(s) : English

Click here for
more information

ELRA-WC0187

Syntactically Annotated Corpus of Tibetan

Syntactically annotated corpus of spoken and written Tibetan from different regions and time periods.
Language(s) : Tibetan

Click here for
more information

ELRA-WC0190

Sejong Morph Tagged Corpus

This corpus consists of 10 million morphologically annotated Korean words.
Language(s) : Korean

Click here for
more information

ELRA-WC0191

Sejong Morph Sense Tagged Corpus

This corpus consists of 5,5 million of semantically annotated Korean words. It is TEI-compliant.
Language(s) : Korean

Click here for
more information

ELRA-WC0192

Sejong Korean Treebank

This corpus contains syntactically parsed sentences (150,000 in 2003).
Language(s) : Korean

Click here for
more information

ELRA-WC0193

Cross-document Structure Theory (CST) Bank

It consists of a collection of documents that have been annotated for cross-document structure theory relationships.
Language(s) : English

Click here for
more information

ELRA-WC0194

Terminal Device Oriented Comparable Corpora

It contains 88,000 pairs of aligned sentences and a hundred Web newspaper articles.
Language(s) : Japanese

Click here for
more information

ELRA-WC0195

Named Organization Corpus

This is a corpus of 13,665 organization names.
Language(s) : Chinese

Click here for
more information

ELRA-WC0196

GENIA corpus

This is a part-of-speech tagged corpus in biomedical domain.
Language(s) : English

Click here for
more information

ELRA-WC0197

Domain-Specific Corpora

The first corpus contains articles of general information about cancer and about different specific cancers. The size of this corpus is about 430,000 words. The other corpus (CHEM) contains about 350,000 words of different articles of chemistry for beginners.
Language(s) :

Click here for
more information

ELRA-WC0198

International Corpus of English - India (ICE-IND)

It consists of one million words of spoken and written English from India and contains 500 texts of approximately 2,000 words each.
Language(s) : English

Click here for
more information

ELRA-WC0199

Azra child corpus

Language(s) : Turkish

Click here for
more information

ELRA-WC0201

Mine child corpus

It contains 1683 sentences.
Language(s) : Turkish

Click here for
more information

ELRA-WC0202

Deniz child corpus

It contains 7000 sentences.
Language(s) : Turkish

Click here for
more information

ELRA-WC0203

METU text corpus

This is a 2 million word corpus from newspapers.
Language(s) : Turkish

Click here for
more information

ELRA-WC0204

New York Times Corpus

It is a component of the American National Corpus First Release and consists of over 4000 articles from the New York Times newswire, for each of the odd-numbered days in July, 2002.
Language(s) : English

Click here for
more information

ELRA-WC0205

Slate Magazine Corpus

It contains 4694 articles from the Slate archives published between 1996 and 2000, on topics such as News and Politics, Arts, Business, Sports, Technology, Travel, Food, etc.
Language(s) : English

Click here for
more information

ELRA-WC0206

ECI/MCI

(Available since 01/09/1996)

The European Corpus Initiative Multilingual Corpus contains over 98 million words, covering most of the major European languages. The primary focus in this effort is on textual material of all kinds, including transcriptions of spoken material.
Language(s) : Albanian - Bulgarian - Chinese - Czech - Danish - Dutch - English - Estonian - French - Gaelic - German - Greek - Italian - Japanese - Latin - Lithuanian - Malay - Norwegian - Portuguese - Russian - Serbian - Spanish - Swedish - Turkish - Uzbek

Click here for
more information

Displaying 481 to 500 (of 730 products)

Result Pages: [<< Prev] ... 21 22 23 24 25 ... [Next >>]