|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 341 to 360 (of 730 products) |
Result Pages: 18 |
This is a corpus of 222 email messages, generated during a four-day exercise.
Language(s) : English
|
|
|
|
The W3C corpus contains data collected from a crawl of the World Wide Web Consortium’s sites (w3c.org). This includes mailing lists, public webpages (html), and some text derived from other types of files (pdf, ...)
W3C data has been annotated for QA (question/answering) topic relevance for use in TREC Enterprise 2005 and 2006.
Language(s) : English
|
|
|
|
This corpus contains 370,715 documents collected from a crawl of the Australian CSIRO organization's websites (*.csiro.au).
The CSIRO Corpus has been annotated for QA (question/answering) topic relevance for use in TREC Enterprise track 2007.
Language(s) : English (Australia)
|
|
|
|
This is a corpus of 10,000,000 words, which presents the modern usage of Sinhala (or Sinhalese), a language spoken in Sri Lanka.
Language(s) : Sinhalese
|
|
|
|
This is a monolingual corpus of contemporary specialized Galician. It contains about 12 million words.
Language(s) : Galician
|
|
|
|
This is an Arabic corpus which contains 153 Arabic articles and 765 human-generated extractive summaries of these articles.
Language(s) : Arabic
|
|
|
|
The Arabic Propbank contains 560 predicates annotated with their relevant arguments in running texts. It is based on 200,000 words from the Arabic Treebank (version 2).
Language(s) : Arabic
|
|
|
|
This is a specialised synchronic corpus of about 9 million words, including academic texts published between 1999 and 2009 in various areas.
Language(s) : Lithuanian
|
|
|
|
This corpus consists of 50 documents in Korean (4,030 sentences) with its translation into Japanese (4,080 sentences). It is aligned at sentence and paragraph levels and is annotated in the XML format.
Language(s) : Korean <<< >>> Japanese
|
|
|
|
This is a comparable corpus of English and Russian news texts. The English part contains newswires texts from 1996 to 1997 (83,491,119 words) and the Russian part contains articles from 2000 to 2001 (14,564,884 words) and others texts from various genres (50,512,584 words) .
Language(s) : English - Russian
|
|
|
|
This is a corpus of written human instructions collected within a virtual game upon the GIVE-2 software infrastructure. It consists of 45 German and 63 American English written discourses in which one subject guided another one in a treasure-hunt style task in virtual worlds.
Language(s) : English (USA) - German
|
|
|
|
The Prague Dependency Treebank is a multi-level corpus of Czech in the form of dependency analytical trees. It consists of 7,110 annotated articles from newspapers and journals, containing 115,844 sentences with 1,957,247 tokens.
Language(s) : Czech
|
|
|
|
This is a morphologically tagged and syntactically parsed corpus of the Ancient Greek text of the Gospels.
Language(s) : Greek
|
|
|
|
This is a corpus of student academic writing samples. It represents a collection of around 830 A grade papers (2.6 million words), covering various disciplines.
Language(s) : English
|
|
|
|
This is a multilingual corpus which contains both parallel and comparable texts, fully annotated.
Language(s) : Danish - English - French - German - Italian - Spanish
|
|
|
|
The Helsinki Corpus of Somali comprises 6,430 words with tags from running text in the SGML-format.
Language(s) : Somali
|
|
|
|
This is a corpus in the Mayan language Uspanteko. It contains 284,000 words of transcribed text, from which 74,000 words are glossed. It also includes translations into Spanish and English.
Language(s) : other - Spanish - English
|
|
|
|
This is a corpus of Russian classical and 20th century literature with translation into Finnish.
Language(s) : Russian <<< >>> Finnish
|
|
|
|
This is a comparable corpus of juridical texts in Russian and Finnish.
Language(s) : Russian <<< >>> Finnish
|
|
|
|
This is a multilingual corpus of juridical texts in English, German, Russian and Swedish.
Language(s) : English - German - Russian - Swedish
|
|
|
|
Displaying 341 to 360 (of 730 products) |
Result Pages: 18 |
|
|