|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 221 to 240 (of 730 products) |
Result Pages: 12 |
The PELCRA is a corpus of Polish containing more than 93,000,000 words. It also comprises bilingual data: Polish-English and English-Polish parallel and comparable corpora.
Language(s) : Polish -
|
|
|
|
This corpus is a collection of interviews in Russian. They were taken from Russian newspapers. The corpus includes interviews from 1996 until now. The topics covered are 'politics and society'.
Language(s) : Russian
|
|
|
|
The INTERSECT is a French/English corpus aligned at sentence level.
Language(s) : EnglishFrench
|
|
|
|
This is a multilingual parallel corpus involving English, French, German, Italian, Greek and Danish.
Language(s) : English - Danish - French - Italian - Greek - German
|
|
|
|
This English-Romanian parallel corpus comprises around 1,6 million tokens in the two languages. It is segmented, morpho-syntactically annotated and lemmatized.
Language(s) : EnglishRomanian
|
|
|
|
This is a French-Romanian parallel corpus containing around 250 thousand tokens (Plato's Republic).
It is morpho-syntactically annotated.
Language(s) : FrenchRomanian
|
|
|
|
This corpus contains various articles from different issues of the daily Evenimentul Zilei (from 1995 to 1996). It contains approximately 92,000 tokens and is morpho-syntactically annotated.
Language(s) : Romanian
|
|
|
|
This Romanian newspaper corpus (ROCO) contains approximately 7,1 million tokens, that are morpho-syntactically annotated.
Language(s) : Romanian
|
|
|
|
This 25,000 word corpus is a Romanian translation of a part of the FrameNet corpus 1.1. It is lemmatized and morpho-syntactically annotated.
Language(s) : Romanian
|
|
|
|
This is an English-Romanian parallel corpus. The English part was taken from the Timex corpus and then translated into Romanian.
It contains 72,000 tokens in Romanian. It is lemmatized and morpho-syntactically annotated.
Language(s) : EnglishRomanian
|
|
|
|
This is an English-Romanian-Italian parallel corpus, annotated at word sense level using WSDTool.
Language(s) : English - Italian - Romanian
|
|
|
|
This corpus of Danish is a collection of electronic texts derived from a range af different sources, for a total 56 million words. It has been automatically tagged at word level (information about part of speech and inflected form).
Language(s) : Danish
|
|
|
|
This is a bilingual English/Mandarin Chinese corpus in the weather domain. It contains more than 45,000 transcribed English utterances collected thanks to the Jupiter weather information system. The average utterance length is 6.0 words. They are aligned to their Mandarin translation.
Language(s) : English (USA)Mandarin Chinese
|
|
|
|
The Hungarian Historical Corpus contains 27 million running words.
Language(s) : Hungarian
|
|
|
|
A pilot corpus was first compiled with 500 student assignments ranging from 1,000 to 5,000 words, totalling approximately 1.5 million words.
The full corpus contains 3000 assignments and comprises almost 9 million words.
Language(s) : English (United Kingdom)
|
|
|
|
This news corpus (Corpus Program) contains 14 million words collected since 1990 from Chinese newspapers and magazines.
Language(s) : Chinese
|
|
|
|
This is a segmented 2 million word corpus of Chinese.
Language(s) : Chinese
|
|
|
|
The Formosan (Taiwan Austronesian) archive is aimed to be a multimodal archive for endangered Formosan languages.
The Chinese archive is divided into 5 sub-groups:
- "Early Mandarin Chinese Lexicon",
- "Lexicon of Pre-Qin Bronze Inscriptions and Bamboo Scripts (LBB)",
- "Modern Chinese Corpus and Treebank" (Sinica corpus and treebank),
- "New Age Corpus: Linguistic Representations and Archives of Multimedia Data",
- "Southern-Min Archive: A Database of Historical Change in Language Distribution".
Language(s) : Chinese
|
|
|
|
The USE is a corpus of 1,489 essays written by 440 Swedish students of English at three different levels of study (collection: years 1999-2001). The total number of words is 1,221,265. The essays cover various topics.
Language(s) : English (Sweden)
|
|
|
|
This corpus contains Swedish texts from the local newspaper "Upsala Nya Tidning" and the national newspaper "Svenska Dagbladet" (years 1995 and 1996). The current size of the corpus is 220,000 single articles and 70 million words.
Language(s) : Swedish
|
|
|
|
Displaying 221 to 240 (of 730 products) |
Result Pages: 12 |
|
|