Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 341 to 360 (of 730 products) Result Pages: [<< Prev]  ... 16  17  18  19  20 ...  [Next >>] 

ELRA-U-W0333
PW CALO Corpus 


This is a corpus of 222 email messages, generated during a four-day exercise.
Language(s) : English

Click here for
more information


ELRA-U-W0334
The W3C Corpus 


The W3C corpus contains data collected from a crawl of the World Wide Web Consortium’s sites (w3c.org). This includes mailing lists, public webpages (html), and some text derived from other types of files (pdf, ...)

W3C data has been annotated for QA (question/answering) topic relevance for use in TREC Enterprise 2005 and 2006.
Language(s) : English

Click here for
more information


ELRA-U-W0335
The CSIRO Corpus 


This corpus contains 370,715 documents collected from a crawl of the Australian CSIRO organization's websites (*.csiro.au).

The CSIRO Corpus has been annotated for QA (question/answering) topic relevance for use in TREC Enterprise track 2007.
Language(s) : English (Australia)

Click here for
more information


ELRA-U-W0336
Corpus of Contemporary Sinhala 


This is a corpus of 10,000,000 words, which presents the modern usage of Sinhala (or Sinhalese), a language spoken in Sri Lanka.
Language(s) : Sinhalese

Click here for
more information


ELRA-U-W0337
Galician Technical Corpus (CTG)


This is a monolingual corpus of contemporary specialized Galician. It contains about 12 million words.
Language(s) : Galician

Click here for
more information


ELRA-U-W0338
Essex Arabic Summaries Corpus (EASC)


This is an Arabic corpus which contains 153 Arabic articles and 765 human-generated extractive summaries of these articles.
Language(s) : Arabic

Click here for
more information


ELRA-U-W0339
Arabic Propbank (APB)


The Arabic Propbank contains 560 predicates annotated with their relevant arguments in running texts. It is based on 200,000 words from the Arabic Treebank (version 2).
Language(s) : Arabic

Click here for
more information


ELRA-U-W0340
The Corpus of Academic Lithuanian (CorALit)


This is a specialised synchronic corpus of about 9 million words, including academic texts published between 1999 and 2009 in various areas.
Language(s) : Lithuanian

Click here for
more information


ELRA-U-W0341
Sejong Korean-Japanese Bilingual Corpus (SKJBC)


This corpus consists of 50 documents in Korean (4,030 sentences) with its translation into Japanese (4,080 sentences). It is aligned at sentence and paragraph levels and is annotated in the XML format.
Language(s) : Korean <<< >>> Japanese

Click here for
more information


ELRA-U-W0342
The comparable corpus of English and Russian news texts 


This is a comparable corpus of English and Russian news texts. The English part contains newswires texts from 1996 to 1997 (83,491,119 words) and the Russian part contains articles from 2000 to 2001 (14,564,884 words) and others texts from various genres (50,512,584 words) .
Language(s) : English - Russian

Click here for
more information


ELRA-U-W0344
GIVE-2 corpus 


This is a corpus of written human instructions collected within a virtual game upon the GIVE-2 software infrastructure. It consists of 45 German and 63 American English written discourses in which one subject guided another one in a treasure-hunt style task in virtual worlds.
Language(s) : English (USA) - German

Click here for
more information


ELRA-U-W0345
Prague Dependency Treebank (PDT)


The Prague Dependency Treebank is a multi-level corpus of Czech in the form of dependency analytical trees. It consists of 7,110 annotated articles from newspapers and journals, containing 115,844 sentences with 1,957,247 tokens.
Language(s) : Czech

Click here for
more information


ELRA-U-W0346
New Testament corpus 


This is a morphologically tagged and syntactically parsed corpus of the Ancient Greek text of the Gospels.
Language(s) : Greek

Click here for
more information


ELRA-U-W0347
The Michigan Corpus of Upper-level Student Papers (MICUSP)


This is a corpus of student academic writing samples. It represents a collection of around 830 A grade papers (2.6 million words), covering various disciplines.
Language(s) : English

Click here for
more information


ELRA-U-W0348
MULINCO corpus 


This is a multilingual corpus which contains both parallel and comparable texts, fully annotated.
Language(s) : Danish - English - French - German - Italian - Spanish

Click here for
more information


ELRA-U-W0349
The Helsinki Corpus of Somali 


The Helsinki Corpus of Somali comprises 6,430 words with tags from running text in the SGML-format.
Language(s) : Somali

Click here for
more information


ELRA-U-W0350
OKMA Uspanteko corpus 


This is a corpus in the Mayan language Uspanteko. It contains 284,000 words of transcribed text, from which 74,000 words are glossed. It also includes translations into Spanish and English.
Language(s) : other - Spanish - English

Click here for
more information


ELRA-U-W0351
Russian-Finnish parallel corpus of literary texts (ParRus)


This is a corpus of Russian classical and 20th century literature with translation into Finnish.
Language(s) : Russian <<< >>> Finnish

Click here for
more information


ELRA-U-W0352
Comparable Russian-Finnish corpus of juridical texts (FinRusLex)


This is a comparable corpus of juridical texts in Russian and Finnish.
Language(s) : Russian <<< >>> Finnish

Click here for
more information


ELRA-U-W0353
Multilingual corpus of juridical texts (MulJur)


This is a multilingual corpus of juridical texts in English, German, Russian and Swedish.
Language(s) : English - German - Russian - Swedish

Click here for
more information


Displaying 341 to 360 (of 730 products) Result Pages: [<< Prev]  ... 16  17  18  19  20 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4