Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 461 to 480 (of 730 products) Result Pages: [<< Prev]  ... 21  22  23  24  25 ...  [Next >>] 

ELRA-WC0162
NEGRA treebank of German 


The NEGRA corpus version 2 contains 355,096 tokens (20,602 sentences) from a German newspaper (the Frankfurter Rundschau).
Language(s) : German (Germany)

Click here for
more information


ELRA-WC0163
Reuters Corpus 


The basis of this corpus is the Reuters Financial News Service comprising 9,063 XML tagged texts, 3.63 million tokens, published during Jan-Dec 2002.
Language(s) : English

Click here for
more information


ELRA-WC0165
Stockholm-Ume°a Corpus (SUC)


The Stockholm-Umeå corpus (SUC) is a Swedish corpus of 1 million words, annotated with part-of-speech, inflectional form and lemma.
Language(s) : Swedish

Click here for
more information


ELRA-WC0166
The Spartacus Database 


It consists of offline handwritten Spanish sentences from four different subtasks. A total of around 100,000 word instances out of a vocabulary of around 3,300 words occur in the collection.
Language(s) : Spanish

Click here for
more information


ELRA-WC0167
The Basque corpus 


It consists of 2,706,809 words and includes all the articles published by Elhuyar Foundation in the zientzia.net site until 2003.
Language(s) : Basque

Click here for
more information


ELRA-WC0168
Arabic full-form lexicon 


This corpus has been converted into a two-level Finite State Transducer (FST) for morphology analysis and generation.
Language(s) : Arabic

Click here for
more information


ELRA-WC0169
Hand-annotated part of BNC 


It contains about 5 million words.
Language(s) : English

Click here for
more information


ELRA-WC0170
Bulgarian and Croatian Comparable Corpus 


This is a Bulgarian-Croatian comparable corpus. Its base are two newspaper subcorpora from larger reference corpora of Bulgarian and Croatian (The Bulgarian Corpus and The Croatian National Corpus - HNK).
Language(s) : Bulgarian - Croatian

Click here for
more information


ELRA-WC0171
Geographical Gazetteer Lists 


Gazetteer lists containing geographical references.
Language(s) : English - Chinese - Arabic - Hindi

Click here for
more information


ELRA-WC0172
Touring information leaflet corpus 


The corpus is a set of 1,100 touring information leaflets, with about 333,000 words and a vocabulary size of 6,300 words.
Language(s) : French

Click here for
more information


ELRA-WC0173
German Multi-word Expression DB 


This one billion word corpus is a database of multi-word expressions where each entry is associated with a sequence of POS-tag/token pair.
Language(s) : German

Click here for
more information


ELRA-WC0174
Dependency Annotated Japanese Corpus 


This is a phrase-Based Dependency annotated corpus with about 38,000 sentences of Mainichi Newspaper articles in 1995.
Language(s) : Japanese

Click here for
more information


ELRA-WC0175
Chinese Corpus of People’s Daily Newspaper 


This corpus consists of about 20k sentences, annotated with word segmentation, part-of-speech tags and three named-entity tags.
Language(s) : Chinese

Click here for
more information


ELRA-WC0176
Ungrammatical Sentence Corpus & Grammatical Sentence Corpus 


This a parallel corpus with an ungrammatical English sentence corpus and its grammatically corrected counterpart. 20,000 words each from a variety of sources: newspapers, emails, academic papers, websites, etc.
Language(s) : English

Click here for
more information


ELRA-WC0177
The OPUS Corpus 


This is a growing collection of translated documents collected from the internet (tens of million words, in 60 languages).
Language(s) : Danish - German - Greek - English - Spanish - Finnish - French - Italian - Dutch - Portuguese - Swedish - Czech - Estonian - Hungarian - Lithuanian - Latvian - Polish - Slovak - Maltese - Slovenian - Afrikaans - Arabic - Azerbaijani - Belarusian - Bulgarian - Breton - Catalan - Welsh - Esperanto - Basque - Hebrew - Croatian - Indonesian - Icelandic - Japanese - Korean - Kurdish - Maori - Macedonian - Occitan - Portuguese (Brazil) - English (United Kingdom) - Romanian - Russian - Tamil - Thai - Turkish - Venda - Vietnamese - Xhosa - Chinese - Chinese (Taiwan) - Zulu - Serbian - Ukrainian - Twi - Irish

Click here for
more information


ELRA-WC0178
SZAK Corpus 


This is an English-Hungarian parallel corpus of technical texts containing 1.2 million words per language.
Language(s) : Hungarian - English

Click here for
more information


ELRA-WC0179
Computer-domain Corpus 


This is an aligned computer-domain corpus containing 74K sentences in five languages.
Language(s) : Spanish - Japanese - French - German - English

Click here for
more information


ELRA-WC0180
Galician corpus 


This corpus of contemporary written Galician is morphosyntactically tagged and contains syntactic and prosodic data : 400.000 words drawn from journalistic texts.
Language(s) : Galician

Click here for
more information


ELRA-WC0181
German Medical Corpus 


This corpus of medical documents in German contain more than 1 million running word forms.
Language(s) : German

Click here for
more information


ELRA-WC0182
English-German Europarl corpus 


This data contains some 20 million words in 63,973 aligned documents in each language.
Language(s) : English - German

Click here for
more information


Displaying 461 to 480 (of 730 products) Result Pages: [<< Prev]  ... 21  22  23  24  25 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4