Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 221 to 240 (of 730 products) Result Pages: [<< Prev]  ... 11  12  13  14  15 ...  [Next >>] 

ELRA-U-W 0213
PELCRA Reference Corpus of Polish (PELCRA)


The PELCRA is a corpus of Polish containing more than 93,000,000 words. It also comprises bilingual data: Polish-English and English-Polish parallel and comparable corpora.
Language(s) : Polish -

Click here for
more information


ELRA-U-W 0214
Corpus of Russian Interview Texts 


This corpus is a collection of interviews in Russian. They were taken from Russian newspapers. The corpus includes interviews from 1996 until now. The topics covered are 'politics and society'.
Language(s) : Russian

Click here for
more information


ELRA-U-W 0215
INTERSECT English/French Parallel Corpus (INTERSECT)


The INTERSECT is a French/English corpus aligned at sentence level.
Language(s) : EnglishFrench

Click here for
more information


ELRA-U-W 0216
LINGUA Multilingual Parallel Corpus (LINGUA)


This is a multilingual parallel corpus involving English, French, German, Italian, Greek and Danish.
Language(s) : English - Danish - French - Italian - Greek - German

Click here for
more information


ELRA-U-W 0217
NAACL 2003 English-Romanian Parallel Corpus 


This English-Romanian parallel corpus comprises around 1,6 million tokens in the two languages. It is segmented, morpho-syntactically annotated and lemmatized.
Language(s) : EnglishRomanian

Click here for
more information


ELRA-U-W 0218
Plato's Republic French/Romanian Parallel Corpus (Republica)


This is a French-Romanian parallel corpus containing around 250 thousand tokens (Plato's Republic).
It is morpho-syntactically annotated.
Language(s) : FrenchRomanian

Click here for
more information


ELRA-U-W 0219
Romanian Newspaper Corpus (Ziare) 


This corpus contains various articles from different issues of the daily Evenimentul Zilei (from 1995 to 1996). It contains approximately 92,000 tokens and is morpho-syntactically annotated.
Language(s) : Romanian

Click here for
more information


ELRA-U-W 0220
ROCO Romanian Newspaper Corpus 


This Romanian newspaper corpus (ROCO) contains approximately 7,1 million tokens, that are morpho-syntactically annotated.
Language(s) : Romanian

Click here for
more information


ELRA-U-W 0221
Romanian Corpus (FrameNet) 


This 25,000 word corpus is a Romanian translation of a part of the FrameNet corpus 1.1. It is lemmatized and morpho-syntactically annotated.
Language(s) : Romanian

Click here for
more information


ELRA-U-W 0222
English/Romanian Parallel Corpus (Timex) 


This is an English-Romanian parallel corpus. The English part was taken from the Timex corpus and then translated into Romanian.
It contains 72,000 tokens in Romanian. It is lemmatized and morpho-syntactically annotated.
Language(s) : EnglishRomanian

Click here for
more information


ELRA-U-W 0223
RoSemCor 


This is an English-Romanian-Italian parallel corpus, annotated at word sense level using WSDTool.
Language(s) : English - Italian - Romanian

Click here for
more information


ELRA-U-W 0224
KorpusDK 


This corpus of Danish is a collection of electronic texts derived from a range af different sources, for a total 56 million words. It has been automatically tagged at word level (information about part of speech and inflected form).
Language(s) : Danish

Click here for
more information


ELRA-U-W 0225
English-Mandarin Chinese Corpus (weather domain) 


This is a bilingual English/Mandarin Chinese corpus in the weather domain. It contains more than 45,000 transcribed English utterances collected thanks to the Jupiter weather information system. The average utterance length is 6.0 words. They are aligned to their Mandarin translation.
Language(s) : English (USA)Mandarin Chinese

Click here for
more information


ELRA-U-W 0226
Hungarian Historical Corpus 


The Hungarian Historical Corpus contains 27 million running words.
Language(s) : Hungarian

Click here for
more information


ELRA-U-W 0227
British Academic Written English Corpus (BAWE)


A pilot corpus was first compiled with 500 student assignments ranging from 1,000 to 5,000 words, totalling approximately 1.5 million words.
The full corpus contains 3000 assignments and comprises almost 9 million words.
Language(s) : English (United Kingdom)

Click here for
more information


ELRA-U-W 0228
Chinese News Corpus 


This news corpus (Corpus Program) contains 14 million words collected since 1990 from Chinese newspapers and magazines.
Language(s) : Chinese

Click here for
more information


ELRA-U-W 0229
Rocling Standard Segmentation Corpus 


This is a segmented 2 million word corpus of Chinese.
Language(s) : Chinese

Click here for
more information


ELRA-U-W 0230
Chinese and Formosan Language Archives 


The Formosan (Taiwan Austronesian) archive is aimed to be a multimodal archive for endangered Formosan languages.

The Chinese archive is divided into 5 sub-groups:
- "Early Mandarin Chinese Lexicon",
- "Lexicon of Pre-Qin Bronze Inscriptions and Bamboo Scripts (LBB)",
- "Modern Chinese Corpus and Treebank" (Sinica corpus and treebank),
- "New Age Corpus: Linguistic Representations and Archives of Multimedia Data",
- "Southern-Min Archive: A Database of Historical Change in Language Distribution".
Language(s) : Chinese

Click here for
more information


ELRA-U-W 0231
Uppsala Student English Corpus (USE)


The USE is a corpus of 1,489 essays written by 440 Swedish students of English at three different levels of study (collection: years 1999-2001). The total number of words is 1,221,265. The essays cover various topics.
Language(s) : English (Sweden)

Click here for
more information


ELRA-U-W 0232
Swedish Newspaper Corpus (SCARRIE)


This corpus contains Swedish texts from the local newspaper "Upsala Nya Tidning" and the national newspaper "Svenska Dagbladet" (years 1995 and 1996). The current size of the corpus is 220,000 single articles and 70 million words.
Language(s) : Swedish

Click here for
more information


Displaying 221 to 240 (of 730 products) Result Pages: [<< Prev]  ... 11  12  13  14  15 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4