Universal Catalogue

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Written Corpora

Displaying 221 to 240 (of 730 products)

Result Pages: [<< Prev] ... 11 12 13 14 15 ... [Next >>]

ELRA-U-W 0213

PELCRA Reference Corpus of Polish (PELCRA)

The PELCRA is a corpus of Polish containing more than 93,000,000 words. It also comprises bilingual data: Polish-English and English-Polish parallel and comparable corpora.
Language(s) : Polish -

Click here for
more information

ELRA-U-W 0214

Corpus of Russian Interview Texts

This corpus is a collection of interviews in Russian. They were taken from Russian newspapers. The corpus includes interviews from 1996 until now. The topics covered are 'politics and society'.
Language(s) : Russian

Click here for
more information

ELRA-U-W 0215

INTERSECT English/French Parallel Corpus (INTERSECT)

The INTERSECT is a French/English corpus aligned at sentence level.
Language(s) : EnglishFrench

Click here for
more information

ELRA-U-W 0216

LINGUA Multilingual Parallel Corpus (LINGUA)

This is a multilingual parallel corpus involving English, French, German, Italian, Greek and Danish.
Language(s) : English - Danish - French - Italian - Greek - German

Click here for
more information

ELRA-U-W 0217

NAACL 2003 English-Romanian Parallel Corpus

This English-Romanian parallel corpus comprises around 1,6 million tokens in the two languages. It is segmented, morpho-syntactically annotated and lemmatized.
Language(s) : EnglishRomanian

Click here for
more information

ELRA-U-W 0218

Plato's Republic French/Romanian Parallel Corpus (Republica)

This is a French-Romanian parallel corpus containing around 250 thousand tokens (Plato's Republic).
It is morpho-syntactically annotated.
Language(s) : FrenchRomanian

Click here for
more information

ELRA-U-W 0219

Romanian Newspaper Corpus (Ziare)

This corpus contains various articles from different issues of the daily Evenimentul Zilei (from 1995 to 1996). It contains approximately 92,000 tokens and is morpho-syntactically annotated.
Language(s) : Romanian

Click here for
more information

ELRA-U-W 0220

ROCO Romanian Newspaper Corpus

This Romanian newspaper corpus (ROCO) contains approximately 7,1 million tokens, that are morpho-syntactically annotated.
Language(s) : Romanian

Click here for
more information

ELRA-U-W 0221

Romanian Corpus (FrameNet)

This 25,000 word corpus is a Romanian translation of a part of the FrameNet corpus 1.1. It is lemmatized and morpho-syntactically annotated.
Language(s) : Romanian

Click here for
more information

ELRA-U-W 0222

English/Romanian Parallel Corpus (Timex)

This is an English-Romanian parallel corpus. The English part was taken from the Timex corpus and then translated into Romanian.
It contains 72,000 tokens in Romanian. It is lemmatized and morpho-syntactically annotated.
Language(s) : EnglishRomanian

Click here for
more information

ELRA-U-W 0223

RoSemCor

This is an English-Romanian-Italian parallel corpus, annotated at word sense level using WSDTool.
Language(s) : English - Italian - Romanian

Click here for
more information

ELRA-U-W 0224

KorpusDK

This corpus of Danish is a collection of electronic texts derived from a range af different sources, for a total 56 million words. It has been automatically tagged at word level (information about part of speech and inflected form).
Language(s) : Danish

Click here for
more information

ELRA-U-W 0225

English-Mandarin Chinese Corpus (weather domain)

This is a bilingual English/Mandarin Chinese corpus in the weather domain. It contains more than 45,000 transcribed English utterances collected thanks to the Jupiter weather information system. The average utterance length is 6.0 words. They are aligned to their Mandarin translation.
Language(s) : English (USA)Mandarin Chinese

Click here for
more information

ELRA-U-W 0226

Hungarian Historical Corpus

The Hungarian Historical Corpus contains 27 million running words.
Language(s) : Hungarian

Click here for
more information

ELRA-U-W 0227

British Academic Written English Corpus (BAWE)

A pilot corpus was first compiled with 500 student assignments ranging from 1,000 to 5,000 words, totalling approximately 1.5 million words.
The full corpus contains 3000 assignments and comprises almost 9 million words.
Language(s) : English (United Kingdom)

Click here for
more information

ELRA-U-W 0228

Chinese News Corpus

This news corpus (Corpus Program) contains 14 million words collected since 1990 from Chinese newspapers and magazines.
Language(s) : Chinese

Click here for
more information

ELRA-U-W 0229

Rocling Standard Segmentation Corpus

This is a segmented 2 million word corpus of Chinese.
Language(s) : Chinese

Click here for
more information

ELRA-U-W 0230

Chinese and Formosan Language Archives

The Formosan (Taiwan Austronesian) archive is aimed to be a multimodal archive for endangered Formosan languages.

The Chinese archive is divided into 5 sub-groups:
- "Early Mandarin Chinese Lexicon",
- "Lexicon of Pre-Qin Bronze Inscriptions and Bamboo Scripts (LBB)",
- "Modern Chinese Corpus and Treebank" (Sinica corpus and treebank),
- "New Age Corpus: Linguistic Representations and Archives of Multimedia Data",
- "Southern-Min Archive: A Database of Historical Change in Language Distribution".
Language(s) : Chinese

Click here for
more information

ELRA-U-W 0231

Uppsala Student English Corpus (USE)

The USE is a corpus of 1,489 essays written by 440 Swedish students of English at three different levels of study (collection: years 1999-2001). The total number of words is 1,221,265. The essays cover various topics.
Language(s) : English (Sweden)

Click here for
more information

ELRA-U-W 0232

Swedish Newspaper Corpus (SCARRIE)

This corpus contains Swedish texts from the local newspaper "Upsala Nya Tidning" and the national newspaper "Svenska Dagbladet" (years 1995 and 1996). The current size of the corpus is 220,000 single articles and 70 million words.
Language(s) : Swedish

Click here for
more information

Displaying 221 to 240 (of 730 products)

Result Pages: [<< Prev] ... 11 12 13 14 15 ... [Next >>]