|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 301 to 320 (of 730 products) |
Result Pages: 16 |
The SYN2000 corpus is a synchronic representative corpus of contemporary written Czech (until 2000). It contains 100 million words (tokens), lemmatised and Part-Of-Speech tagged.
Language(s) : Czech
|
|
|
|
This is a synchronic written corpus of 300 million of words. It contains exclusively journalist texts in Czech from 1989 to 2004.
Language(s) : Czech
|
|
|
|
The Nepali Written Corpus is a part of the Nepali National Corpus (NNC).
This is a monolingual corpus of 15 million words containing texts from various books, magazines, newspapers and from Internet websites. It is segmented and POS tagged.
It is available in the ELRA catalogue http://catalog.elra.info under the reference ELRA-W0076.
Language(s) : Nepali
|
|
|
|
The Nepali-English Parallel Corpus is a part of the Nepali National Corpus (NNC).
This is a parallel corpus of about 4 million words from two different genres : computing and national development.
A part of it available in the ELRA catalogue http://catalog.elra.info under the reference ELRA-W0077.
Language(s) : Nepali <<< >>> English
|
|
|
|
The PukWaC is the same as the ukWaC (an English 2 billion word corpus constructed from the Web), but annotation includes a full dependency parsing.
Language(s) : English (United Kingdom)
|
|
|
|
This is a copy of the English Wikipedia's full content at the date of 2009. It represents about 800 million tokens and is POS-tagged, lemmatized, and fully parsed with a dependency parser.
Language(s) : English (United Kingdom)
|
|
|
|
This corpus contains more than 3600 written speeches from native English speakers. It represents about 7.9 millions words, and more than 67 thousand tags about audience reactions.
Language(s) : English (USA)
|
|
|
|
This corpus in Dutch language contains about 200,000 words from essays written by 145 students (from Belgium). It is syntactically annotated and provides metadata about students' personality.
Language(s) : Dutch, Flemish (Belgium)
|
|
|
|
This is a parallel corpus containing 100,720 words (4325 sentences) in common English, translated into Urdu and Nepali.
Language(s) : English >>>> Urdu - English >>>> Nepali
|
|
|
|
This is a corpus of 675 sentences (11839 words) containing 833 dyslexic real-word errors tagged.
Language(s) : English
|
|
|
|
This is corpus of mispellings collected from native and non-native English speakers.
Language(s) : English
|
|
|
|
This is a corpus of spoken British English covering the period between 1960 and 2000. It contains 885,436 words, fully-parsed and annotated (87,000 trees).
Language(s) : English (United Kingdom)
|
|
|
|
This is a small word-aligned parallel corpus of Luganda and English. Luganda is a major language of Uganda and is spoken by 6 million people as a first language.
Language(s) : English <<< >>> other
|
|
|
|
This is a parallel corpus of English and Swahili which contains about a million words for each language.
Language(s) : English <<< >>> Swahili
|
|
|
|
The British Columbia Conversation Corpus (BC3) contains 40 email threads (3222 sentences) annotated with linguistic features for email summarization.
Language(s) : English
|
|
|
|
The Mannheim German Reference Corpus is a collection of German corpora covering the period of 1956 to 2001. It contains more than 3.9 billion tokens and is Part-Of-Speech tagged.
Language(s) : German
|
|
|
|
The Quranic Arabic Corpus is a version of the Quran annotated for part-of-speech and associated with a syntactic treebank.
Language(s) : Arabic
|
|
|
|
The Turku Dependency Treebank (TDT) is a dependency-annotated treebank of Finnish. It contains articles of less than 75 sentences from the Finnish Wikipedia.
Language(s) : Finnish
|
|
|
|
This is a collection of written, spoken and multimodal corpora, which represents about 300 million tokens.
Language(s) : Russian - Russian >>>> English - Russian >>>> German -
|
|
|
|
The Corpus of Greek Texts (CGT) includes spoken and written texts produced between 1990 and 2010. It contains 30 million words.
Language(s) : Greek
|
|
|
|
Displaying 301 to 320 (of 730 products) |
Result Pages: 16 |
|
|