|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 381 to 400 (of 730 products) |
Result Pages: 20 |
CNEC 1.0 is a corpus in Mandarin Chinese, annotated with named entity tags for person name, location name, organization name, location as organization and organization as location.
Language(s) : Chinese
|
|
|
|
This is a Dutch corpus annotated for coreference including annotation for:
- identity relations between noun phrases,
- bound relations where an anaphor refers to a quantified antecedent,
- bridge relations when the anaphor is a subset of the antecedent,
- predicative relations, indicating extra information about the referent.
Language(s) : Dutch
|
|
|
|
This is a corpus of 257,000 words annotated for co-reference. It contains 18,000 distinct document-level entities and approximately 55,000 entity mentions.
Language(s) : English (USA)
|
|
|
|
This corpus contains 600 pairs of textual units manually annotated for textual entailment.
Language(s) : Greek
|
|
|
|
It was compiled from a sample of 11 million words from the 12th century to nowadays. The corpus has been manually tagged for grammatical category, number of graphemes, number of syllables and phonological structure.
Language(s) : Serbian
|
|
|
|
This is a trilingual corpus of 3 million words in Bulgarian, Polish and Lithuanian. The BG-PL-LT corpus includes a parallel and a comparable corpus.
Language(s) : Bulgarian - Polish - Lithuanian
|
|
|
|
The Bulgarian National Corpus consists of about 320,000,000 words from more than 10,000 texts. This corpus reflects the state of the Bulgarian language (mainly in its written form) from the middle of XX century (1945) until present.
Language(s) : Bulgarian
|
|
|
|
This is a parallel corpus which consists of web documents from the general domain. It contains 4,190,000 tokens in Greek, 3,430,000 tokens in Bulgarian and 3,900,000 tokens in English.
Language(s) : Greek - Bulgarian - English
|
|
|
|
This corpus contains 1,010,996 words of written British English published between 2003 and 2008. A large part of the texts (82%) were published between 2005 and 2007.
Language(s) : English (United Kingdom)
|
|
|
|
This corpus contains texts produced by learners of German (from 49 different native languages). The most represented learners are from Danemark, England, France, Poland and Russia.
Language(s) : German
|
|
|
|
This is a multilingual corpus which contains 44 million of words. It consists of texts of fiction semi-automatically aligned between the Czech version and one of the following languages: English, French, German, Russian and Spanish.
Language(s) : Czech >>>> English - Czech >>>> French - Czech >>>> German - Czech >>>> Russian - Czech >>>> Spanish
|
|
|
|
This is a Thai corpus which contains 500 articles of about 120 words per document. It was segmented into 8,000 Elementary Discourse Units (EDUs) and annotated for Text Summarization and Question Answering.
Language(s) : Thai
|
|
|
|
It contains 1,478 samples of text extracted from 127 textbooks (about 1,000,000 characters). It consists of digitized parts of textbooks from elementary schools, junior high schools and high schools (grade 1 to 13) in Japan.
Language(s) : Japanese
|
|
|
|
It contains 677 abstracts annotated for gene regulation events, described both by verbs and nominalised verbs. The corpus consists of MEDLINE abstracts on the subject of E. coli.
Language(s) : English
|
|
|
|
It contains 240 MEDLINE abstracts annotated for events relating to gene regulation and expression.
Language(s) : English
|
|
|
|
This is a corpus of texts written by Basque language learners and native speakers. It is annotated for errors and POS.
Language(s) : Basque
|
|
|
|
This corpus contains a set of English language quotations manually annotated for sentiment (positive, negative, objective/neutral) expressed towards the entities mentioned inside the quotation.
Language(s) : English
|
|
|
|
This is a Japanese blog corpus. It contains 249 articles and 4,186 sentences. It is fully annotated (part-of-speech, syntax, case, ellipsis, opinion information and review information).
Language(s) : Japanese
|
|
|
|
This is a Maltese corpus which contains legal, news, literature and web texts. It is annotated in the XML format.
Language(s) : Maltese
|
|
|
|
This corpus contains homepages of researchers manually annotated for social information. It is in html format.
Language(s) : Hungarian
|
|
|
|
Displaying 381 to 400 (of 730 products) |
Result Pages: 20 |
|
|