|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 461 to 480 (of 730 products) |
Result Pages: 24 |
The NEGRA corpus version 2 contains 355,096 tokens (20,602 sentences) from a German newspaper (the Frankfurter Rundschau).
Language(s) : German (Germany)
|
|
|
|
The basis of this corpus is the Reuters Financial News Service comprising 9,063 XML tagged texts, 3.63 million tokens, published during Jan-Dec 2002.
Language(s) : English
|
|
|
|
The Stockholm-Umeå corpus (SUC) is a Swedish corpus of 1 million words, annotated with part-of-speech, inflectional form and lemma.
Language(s) : Swedish
|
|
|
|
It consists of offline handwritten Spanish sentences from four different subtasks. A total of around 100,000 word instances out of a vocabulary of around 3,300 words occur in the collection.
Language(s) : Spanish
|
|
|
|
It consists of 2,706,809 words and includes all the articles published by Elhuyar Foundation in the zientzia.net site until 2003.
Language(s) : Basque
|
|
|
|
This corpus has been converted into a two-level Finite State Transducer (FST) for morphology analysis and generation.
Language(s) : Arabic
|
|
|
|
It contains about 5 million words.
Language(s) : English
|
|
|
|
This is a Bulgarian-Croatian comparable corpus. Its base are two newspaper subcorpora from larger reference corpora of Bulgarian and Croatian (The Bulgarian Corpus and The Croatian National Corpus - HNK).
Language(s) : Bulgarian - Croatian
|
|
|
|
Gazetteer lists containing geographical references.
Language(s) : English - Chinese - Arabic - Hindi
|
|
|
|
The corpus is a set of 1,100 touring information leaflets, with about 333,000 words and a vocabulary size of 6,300 words.
Language(s) : French
|
|
|
|
This one billion word corpus is a database of multi-word expressions where each entry is associated with a sequence of POS-tag/token pair.
Language(s) : German
|
|
|
|
This is a phrase-Based Dependency annotated corpus with about 38,000 sentences of Mainichi Newspaper articles in 1995.
Language(s) : Japanese
|
|
|
|
This corpus consists of about 20k sentences, annotated with word segmentation, part-of-speech tags and three named-entity tags.
Language(s) : Chinese
|
|
|
|
This a parallel corpus with an ungrammatical English sentence corpus and its grammatically corrected counterpart. 20,000 words each from a variety of sources: newspapers, emails, academic papers, websites, etc.
Language(s) : English
|
|
|
|
This is a growing collection of translated documents collected from the internet (tens of million words, in 60 languages).
Language(s) : Danish - German - Greek - English - Spanish - Finnish - French - Italian - Dutch - Portuguese - Swedish - Czech - Estonian - Hungarian - Lithuanian - Latvian - Polish - Slovak - Maltese - Slovenian - Afrikaans - Arabic - Azerbaijani - Belarusian - Bulgarian - Breton - Catalan - Welsh - Esperanto - Basque - Hebrew - Croatian - Indonesian - Icelandic - Japanese - Korean - Kurdish - Maori - Macedonian - Occitan - Portuguese (Brazil) - English (United Kingdom) - Romanian - Russian - Tamil - Thai - Turkish - Venda - Vietnamese - Xhosa - Chinese - Chinese (Taiwan) - Zulu - Serbian - Ukrainian - Twi - Irish
|
|
|
|
This is an English-Hungarian parallel corpus of technical texts containing 1.2 million words per language.
Language(s) : Hungarian - English
|
|
|
|
This is an aligned computer-domain corpus containing 74K sentences in five languages.
Language(s) : Spanish - Japanese - French - German - English
|
|
|
|
This corpus of contemporary written Galician is morphosyntactically tagged and contains syntactic and prosodic data : 400.000 words drawn from journalistic texts.
Language(s) : Galician
|
|
|
|
This corpus of medical documents in German contain more than 1 million running word forms.
Language(s) : German
|
|
|
|
This data contains some 20 million words in 63,973 aligned documents in each language.
Language(s) : English - German
|
|
|
|
Displaying 461 to 480 (of 730 products) |
Result Pages: 24 |
|
|