|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 321 to 340 (of 730 products) |
Result Pages: 17 |
The Dependency Treebank of Hebrew consists of 6220 sentences, fully dependency parsed.
Language(s) : Hebrew
|
|
|
|
This is a corpus-based parallel treebank of 200 sentences in both languages Quechua and Spanish.
Language(s) : Spanish (Peru) <<< >>> Quechua (Peru)
|
|
|
|
The data contained in this treebank is representative of modern written standard Romanian. This resource is morpho-syntactically tagged.
It consists of 36,150 tokens
Language(s) : Romanian (Romania)
|
|
|
|
This is an English corpus of 1,068 web pages related to the "Berlin central station". It contains 55,255 sentences annotated for Name Entities (NE).
Language(s) : English
|
|
|
|
This is a corpus of newswire texts coreferentially annotated for noun phrase (NP) coreference on 55,000 words and for event coreference on 12,500 words.
Language(s) : English
|
|
|
|
It contains various corpora in Modern Persian (Farsi), annotated for Part-of-Speech and/or pronunciation.
Language(s) : Persian
|
|
|
|
It contains paragraph-aligned parallel corpora in the six official languages of the United Nations. It represents a total of around 3 million tokens per language.
Language(s) : English - French - Arabic - Chinese - Russian - Spanish
|
|
|
|
The PolyU Business Corpus contains 3 comparable corpora of business texts in English, Chinese and Japanese.
It consists of news and reports from the business and finance sections of newspapers, annual reports and press releases from companies, online versions of company brochures and leaflets, ...
Language(s) : Japanese (Japan) - Chinese (Hong Kong) - English (Hong Kong)
|
|
|
|
This is a collection of review documents labeled with respect to their overall sentiment polarity (positive or negative). It contains 400 reviews in English and 400 in Spanish.
Language(s) : English - Spanish
|
|
|
|
This corpus contains 5 original texts in English, with plagiarised versions of them in English, Spanish and Italian.
Language(s) : English - Spanish - Italian
|
|
|
|
This is an annotated corpus of biomedical English containing 1100 sentences. It consists of biomedical research articles' abstracts annotated for relationships, named entities, and syntactic dependencies.
Language(s) : English
|
|
|
|
The Greek biomedical corpus contains 11.5 million word-forms from periodical articles and conference papers in modern Greek.
Annotation includes structural data, morphosyntactic and semantic tagging, biomedical words and multi-word terms identification. The corpus is annotated in the XML format, following TEI guidelines.
Language(s) : Greek (Greece)
|
|
|
|
It contains 1152 documents in English collected from pro-death penalty and anti-death penalty websites. Documents are annotated for sentiment and for document's general tone.
Language(s) : English
|
|
|
|
This is a French corpus annotated for multiword nouns. It contains 166,000 words (8,600 sentences) from which 5,057 occurrences of multiword nouns have been annotated.
Language(s) : French
|
|
|
|
This is a POS tagged corpus in Galician which contains 309,505 gramatical elements extracted from newspapers and journals.
Language(s) : Galician
|
|
|
|
This is a text database of 3,323,325 tokens (53,396 unique tokens) in Maltese collected from on-line newspapers.
Language(s) : Maltese
|
|
|
|
This is a text database of 60,052,261 tokens (396,469 unique tokens) in Hebrew collected from on-line newspapers, TV transcripts and medical forums.
Language(s) : Hebrew
|
|
|
|
This corpus contains 800 military reports in English (886,000 tokens) from the KFOR activities of the German Federal Army. Annotation includes Part-of-speech, Named Entities (NE), structural parts of the document (topic, source, ...) and verbal groups in different layers of annotation.
Language(s) : English (Germany)
|
|
|
|
The TR-CoNLL corpus contains 946 news articles (204,566 tokens) from the CoNLL shared task, in which 6,980 toponym instances have been annotated.
Toponym Resolution (TR) is the task of mapping from a set of potentially ambiguous place names to the intended latitude/longitude coordinates of places they refer to, taking into account textual context.
Language(s) : English
|
|
|
|
This is a corpus of around 15,000 email messages.
Language(s) : English
|
|
|
|
Displaying 321 to 340 (of 730 products) |
Result Pages: 17 |
|
|