|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 441 to 460 (of 730 products) |
Result Pages: 23 |
The purpose of the project was the development of a corpus, cross-lingual name matching, multi-lingual named entity recognition.
Language(s) : English - Greek - French - Italian
|
|
|
|
It consists of 101 PDF documents (181,748 words) with a great variety in their content, appearance, style, and structure.
Language(s) : English
|
|
|
|
It consists of 5.500 words of newspaper articles, paragraphs of literature and sentences constructed and annotated by a professional linguist.
Language(s) : Greek
|
|
|
|
The parallel corpus, of approximately 50.000 tokens, comprises agricultural reports, specifications and circulars produced within the European Union.
Language(s) : Swedish - English
|
|
|
|
It contains newspaper texts, amounting to 175 million words.
Language(s) : Italian
|
|
|
|
This is a diachronic Italian corpus, divided into three subcorpora, of nuclear physics texts belonging to different genres.
Language(s) : Italian
|
|
|
|
This corpus contains the abstracts of newspapers and is structured
into diachronic stages of around ten years.
Language(s) : English
|
|
|
|
This is a parallel treebank and a Czech-English syntactically annotated resource.
Language(s) : Czech - English
|
|
|
|
This corpus is composed of computer-based texts of over 400 million words.
Language(s) : Czech
|
|
|
|
It contains 840 documents of 452,000 words in total from newspapers related to biotechnology business information.
Language(s) : English
|
|
|
|
The Delos corpus is a collection of economic domain texts of approximately five million words and of varying genre (press reportage, news, articles, interviews and scientific studies). It has been automatically annotated.
Language(s) : Greek
|
|
|
|
The AAC is a very large and complex electronic text collection.
Language(s) : German - English - Russian
|
|
|
|
This treebank aims to produce a large-scale corpus in which approximately 30,000 discourse connectives are annotated.
Language(s) : English
|
|
|
|
It consists of approximately 700,000 tokens (40,000 sentences) of semi-automatically tagged German newspaper text.
Language(s) : German
|
|
|
|
The TüBa-D/Z treebank contains 45,200 sentences (794,079 tokens) taken from a German newspaper corpus (data based on 'die tageszeitung' from taz). The syntactic annotation was performed manually.
Language(s) : German
|
|
|
|
The GNOME Corpus includes texts from three genres - museum labels, pharmaceutical leaflets, and tutorial dialogues - in which different types of discourse and semantic information have been annotated.
Language(s) : English
|
|
|
|
This corpus includes over 800,000 English language news stories.
Language(s) : French - English
|
|
|
|
It includes around 9000 scientific abstracts in various domains with around 1 million tokens for each language.
Language(s) : English - German
|
|
|
|
The corpora consist of 324,616 Korean sentences, half translated from Japanese and the other half from English that match the original Japanese.
Language(s) : Korean - Japanese - English
|
|
|
|
It consists of 21,578 news appeared on the Reuters newswire in 1987.
Language(s) : English
|
|
|
|
Displaying 441 to 460 (of 730 products) |
Result Pages: 23 |
|
|