Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 301 to 320 (of 730 products) Result Pages: [<< Prev]  ... 16  17  18  19  20 ...  [Next >>] 

ELRA-U-W0293
SYN2000 corpus 


The SYN2000 corpus is a synchronic representative corpus of contemporary written Czech (until 2000). It contains 100 million words (tokens), lemmatised and Part-Of-Speech tagged.
Language(s) : Czech

Click here for
more information


ELRA-U-W0294
SYN2006PUB 


This is a synchronic written corpus of 300 million of words. It contains exclusively journalist texts in Czech from 1989 to 2004.
Language(s) : Czech

Click here for
more information


ELRA-U-W0295
Nepali Written Corpus 


The Nepali Written Corpus is a part of the Nepali National Corpus (NNC).

This is a monolingual corpus of 15 million words containing texts from various books, magazines, newspapers and from Internet websites. It is segmented and POS tagged.

It is available in the ELRA catalogue http://catalog.elra.info under the reference ELRA-W0076.
Language(s) : Nepali

Click here for
more information


ELRA-U-W0296
Nepali-English Parallel Corpus 


The Nepali-English Parallel Corpus is a part of the Nepali National Corpus (NNC).

This is a parallel corpus of about 4 million words from two different genres : computing and national development.

A part of it available in the ELRA catalogue http://catalog.elra.info under the reference ELRA-W0077.
Language(s) : Nepali <<< >>> English

Click here for
more information


ELRA-U-W0297
PukWaC English Web Corpus 


The PukWaC is the same as the ukWaC (an English 2 billion word corpus constructed from the Web), but annotation includes a full dependency parsing.
Language(s) : English (United Kingdom)

Click here for
more information


ELRA-U-W0298
WaCkypedia English corpus 


This is a copy of the English Wikipedia's full content at the date of 2009. It represents about 800 million tokens and is POS-tagged, lemmatized, and fully parsed with a dependency parser.
Language(s) : English (United Kingdom)

Click here for
more information


ELRA-U-W0299
CORpus of tagged Political Speeches (CORPS)


This corpus contains more than 3600 written speeches from native English speakers. It represents about 7.9 millions words, and more than 67 thousand tags about audience reactions.
Language(s) : English (USA)

Click here for
more information


ELRA-U-W0300
Personae Corpus 


This corpus in Dutch language contains about 200,000 words from essays written by 145 students (from Belgium). It is syntactically annotated and provides metadata about students' personality.
Language(s) : Dutch, Flemish (Belgium)

Click here for
more information


ELRA-U-W0301
Urdu-Nepali-English Parallel Corpus 


This is a parallel corpus containing 100,720 words (4325 sentences) in common English, translated into Urdu and Nepali.
Language(s) : English >>>> Urdu - English >>>> Nepali

Click here for
more information


ELRA-U-W0302
Real-word Error Corpus 


This is a corpus of 675 sentences (11839 words) containing 833 dyslexic real-word errors tagged.
Language(s) : English

Click here for
more information


ELRA-U-W0303
Birkbeck spelling error corpus 


This is corpus of mispellings collected from native and non-native English speakers.
Language(s) : English

Click here for
more information


ELRA-U-W0304
Diachronic Corpus of Present-Day Spoken English (DCPSE)


This is a corpus of spoken British English covering the period between 1960 and 2000. It contains 885,436 words, fully-parsed and annotated (87,000 trees).
Language(s) : English (United Kingdom)

Click here for
more information


ELRA-U-W0305
English - Luganda Parallel Corpus 


This is a small word-aligned parallel corpus of Luganda and English. Luganda is a major language of Uganda and is spoken by 6 million people as a first language.
Language(s) : English <<< >>> other

Click here for
more information


ELRA-U-W0306
SAWA Corpus 


This is a parallel corpus of English and Swahili which contains about a million words for each language.
Language(s) : English <<< >>> Swahili

Click here for
more information


ELRA-U-W0307
British Columbia Conversation Corpus (BC3)


The British Columbia Conversation Corpus (BC3) contains 40 email threads (3222 sentences) annotated with linguistic features for email summarization.
Language(s) : English

Click here for
more information


ELRA-U-W0308
Mannheim German Reference Corpus (DeReKo)


The Mannheim German Reference Corpus is a collection of German corpora covering the period of 1956 to 2001. It contains more than 3.9 billion tokens and is Part-Of-Speech tagged.
Language(s) : German

Click here for
more information


ELRA-U-W0309
Quranic Arabic Corpus 


The Quranic Arabic Corpus is a version of the Quran annotated for part-of-speech and associated with a syntactic treebank.
Language(s) : Arabic

Click here for
more information


ELRA-U-W0310
Turku Dependency Treebank (TDT)


The Turku Dependency Treebank (TDT) is a dependency-annotated treebank of Finnish. It contains articles of less than 75 sentences from the Finnish Wikipedia.
Language(s) : Finnish

Click here for
more information


ELRA-U-W0311
Russian National Corpus (RNC)


This is a collection of written, spoken and multimodal corpora, which represents about 300 million tokens.
Language(s) : Russian - Russian >>>> English - Russian >>>> German -

Click here for
more information


ELRA-U-W0312
Corpus of Greek Texts (CGT)


The Corpus of Greek Texts (CGT) includes spoken and written texts produced between 1990 and 2010. It contains 30 million words.
Language(s) : Greek

Click here for
more information


Displaying 301 to 320 (of 730 products) Result Pages: [<< Prev]  ... 16  17  18  19  20 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4