|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 81 to 100 (of 730 products) |
Result Pages: 5 |
This resource is a written corpus of French as a foreign language (size: 450,000 words). It is an error-tagged learner corpus.
Language(s) : French
|
|
|
|
EUS3LB is a Basque treebank of 50,000 words. The annotation concerns : POS for morphosyntactic information, constituents and functions for syntactic information.
Language(s) : Basque (Spain)
|
|
|
|
The Modern Hebrew Treebank contains 6,500 sentences of news items from the Ha'aretz daily newspaper. It is segmented and analysed at morpho-syntactic level.
Language(s) : Hebrew (Israel)
|
|
|
|
This corpus is composed of 700 sentences randomly extracted from the Wall Street Journal treebank. It has been parsed with an LFG grammar, and annotated with grammatical dependency relations.
Language(s) : English (USA)
|
|
|
|
METU-Sabanci Turkish Treebank contains 7262 sentences taken from METU Turkish Corpus. It is annotated at morpho-syntactic level.
Language(s) : Turkish (Turkey)
|
|
|
|
This is a parallel treebank of North European languages: Danish, Dutch, English, Estonian, Finnish, German, Icelandic, Norwegian and Swedish.
The data is taken from the Norwegian original and the translations of the first two chapters of Jostein Gaarder's novel 'Sofies verden'.
Language(s) : Danish (Denmark) - Dutch (Netherlands) - English (United Kingdom) - Estonian (Estonia) - Finnish (Finland) - German (Germany) - Icelandic - Norwegian (Norway) - Swedish (Sweden)
|
|
|
|
This is a modern Greek corpus annotated at multiple levels. The data comes from transcripts of European parliamentary sessions and web documents (health, travel, politics).
Language(s) : Modern Greek (Greece)
|
|
|
|
The Penn Treebank is a bank of linguistic trees for English. The data comes from several well-known corpora: Wall Street Journal, the Brown Corpus, Switchboard and ATIS (more than one million words). The corpus contains annotations showing rough syntactic and semantic information.
Language(s) : English (USA)
|
|
|
|
Sinica 5.0 contains 10 million words from various topics: philosophy, science, society, art, life, literature.
Texts are segmented and POS tagged.
Language(s) : Chinese
|
|
|
|
Sinica Treebank v3.0 was released in 2000 with texts taken from the Sinica Corpus. It contains 361,834 words and 61,087 trees.
Language(s) : Chinese
|
|
|
|
This resource is a spoken German corpus that was annotated in the project Verbmobil. It contains 360,000 words and 38,000 sentences.
Language(s) : German (Germany)
|
|
|
|
The Tübingen Treebank of Spoken English is composed of 30,000 sentences (ca. 310,000 words) of spontaneous dialogues which were manually transliterated. The syntactic annotation was performed manually.
It was annotated in the project Verbmobil.
Language(s) : English
|
|
|
|
The Tübingen Treebank of Spoken Japanese is composed of 18,000 sentences (ca. 160,000 words) of spontaneous dialogues which were manually transliterated. The syntactic annotation was performed manually.
It was annotated in the project Verbmobil.
Language(s) : Japanese
|
|
|
|
The Tübingen Partially Parsed Corpus of Written German is composed of articles from 'die Tageszeitung' (taz newspaper). The data comprises more than 200 million word tokens and has been automatically annotated (POS, morphological ambiguity classes, clause structure, topological fields and chunks).
Language(s) : German (Germany)
|
|
|
|
The Danish Dependency Treebank was built on top of the Danish PAROLE corpus. It consists of 474 texts containing 5,540 sentences and 100,200 words.
Language(s) : Danish (Denmark)
|
|
|
|
Arborest is a 2,500 sentence treebank of Estonian which was built in a two-stage process using both Constraint Grammar (CG) and Phrase Structure Grammar (PSG).
Language(s) : Estonian
|
|
|
|
The Penn Arabic Treebank is a one million word corpus that has been syntactically annotated.
Language(s) : Modern Standard Arabic
|
|
|
|
The Prague Arabic Dependency Treebank is a multi-level corpus of Modern Standard Arabic in the form of dependency analytical trees. More than 113,500 tokens are analysed and provided with disambiguated morphological information. Complete annotation of MorphoTrees for more than 148,000 tokens is also available (analytical processing for 49,000).
Language(s) : Modern Standard Arabic
|
|
|
|
The ATR corpus is a treebank of 6,553 sentences of Japanese conversations in the field of hotel reservations.
Language(s) : Japanese (Japan)
|
|
|
|
This resource is developed within the framework of the TREPIL project (2004-2008). The aim of the project is the semi-automatic construction of a Norwegian treebank.
Language(s) : Norwegian (Norway)
|
|
|
|
Displaying 81 to 100 (of 730 products) |
Result Pages: 5 |
|
|