Universal Catalogue

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Written Corpora

Displaying 81 to 100 (of 730 products)

Result Pages: [<< Prev] 1 2 3 4 5 ... [Next >>]

ELRA-U-W 0072

French Interlanguage Database (FRIDA)

This resource is a written corpus of French as a foreign language (size: 450,000 words). It is an error-tagged learner corpus.
Language(s) : French

Click here for
more information

ELRA-U-W 0073

EUS3LB Basque Dependency Treebank

EUS3LB is a Basque treebank of 50,000 words. The annotation concerns : POS for morphosyntactic information, constituents and functions for syntactic information.
Language(s) : Basque (Spain)

Click here for
more information

ELRA-U-W 0074

Modern Hebrew Treebank (MHT)

The Modern Hebrew Treebank contains 6,500 sentences of news items from the Ha'aretz daily newspaper. It is segmented and analysed at morpho-syntactic level.
Language(s) : Hebrew (Israel)

Click here for
more information

ELRA-U-W 0075

The Parc 700 Dependency Bank

This corpus is composed of 700 sentences randomly extracted from the Wall Street Journal treebank. It has been parsed with an LFG grammar, and annotated with grammatical dependency relations.
Language(s) : English (USA)

Click here for
more information

ELRA-U-W 0076

METU-Sabanci Turkish Treebank

METU-Sabanci Turkish Treebank contains 7262 sentences taken from METU Turkish Corpus. It is annotated at morpho-syntactic level.
Language(s) : Turkish (Turkey)

Click here for
more information

ELRA-U-W 0077

The Sofie Treebank

This is a parallel treebank of North European languages: Danish, Dutch, English, Estonian, Finnish, German, Icelandic, Norwegian and Swedish.
The data is taken from the Norwegian original and the translations of the first two chapters of Jostein Gaarder's novel 'Sofies verden'.
Language(s) : Danish (Denmark) - Dutch (Netherlands) - English (United Kingdom) - Estonian (Estonia) - Finnish (Finland) - German (Germany) - Icelandic - Norwegian (Norway) - Swedish (Sweden)

Click here for
more information

ELRA-U-W 0078

Greek Dependency Treebank (GDT)

This is a modern Greek corpus annotated at multiple levels. The data comes from transcripts of European parliamentary sessions and web documents (health, travel, politics).
Language(s) : Modern Greek (Greece)

Click here for
more information

ELRA-U-W 0079

Penn Treebank (PTB)

The Penn Treebank is a bank of linguistic trees for English. The data comes from several well-known corpora: Wall Street Journal, the Brown Corpus, Switchboard and ATIS (more than one million words). The corpus contains annotations showing rough syntactic and semantic information.
Language(s) : English (USA)

Click here for
more information

ELRA-U-W 0080

Academia Sinica Balanced Corpus of Modern Chinese (Sinica)

Sinica 5.0 contains 10 million words from various topics: philosophy, science, society, art, life, literature.
Texts are segmented and POS tagged.
Language(s) : Chinese

Click here for
more information

ELRA-U-W 0081

Sinica Treebank

Sinica Treebank v3.0 was released in 2000 with texts taken from the Sinica Corpus. It contains 361,834 words and 61,087 trees.
Language(s) : Chinese

Click here for
more information

ELRA-U-W 0082

Tübingen Treebank of Spoken German (TüBa-D/S)

This resource is a spoken German corpus that was annotated in the project Verbmobil. It contains 360,000 words and 38,000 sentences.
Language(s) : German (Germany)

Click here for
more information

ELRA-U-W 0083

Tübingen Treebank of Spoken English (TüBa-E/S)

The Tübingen Treebank of Spoken English is composed of 30,000 sentences (ca. 310,000 words) of spontaneous dialogues which were manually transliterated. The syntactic annotation was performed manually.
It was annotated in the project Verbmobil.
Language(s) : English

Click here for
more information

ELRA-U-W 0084

Tübingen Treebank of Spoken Japanese (TüBa-J/S)

The Tübingen Treebank of Spoken Japanese is composed of 18,000 sentences (ca. 160,000 words) of spontaneous dialogues which were manually transliterated. The syntactic annotation was performed manually.
It was annotated in the project Verbmobil.
Language(s) : Japanese

Click here for
more information

ELRA-U-W 0085

Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z)

The Tübingen Partially Parsed Corpus of Written German is composed of articles from 'die Tageszeitung' (taz newspaper). The data comprises more than 200 million word tokens and has been automatically annotated (POS, morphological ambiguity classes, clause structure, topological fields and chunks).
Language(s) : German (Germany)

Click here for
more information

ELRA-U-W 0086

Danish Dependency Treebank

The Danish Dependency Treebank was built on top of the Danish PAROLE corpus. It consists of 474 texts containing 5,540 sentences and 100,200 words.
Language(s) : Danish (Denmark)

Click here for
more information

ELRA-U-W 0087

Estonian Treebank Arborest (Arborest)

Arborest is a 2,500 sentence treebank of Estonian which was built in a two-stage process using both Constraint Grammar (CG) and Phrase Structure Grammar (PSG).
Language(s) : Estonian

Click here for
more information

ELRA-U-W 0088

Penn Arabic Treebank (ATB)

The Penn Arabic Treebank is a one million word corpus that has been syntactically annotated.
Language(s) : Modern Standard Arabic

Click here for
more information

ELRA-U-W 0089

Prague Arabic Dependency Treebank (PADT)

The Prague Arabic Dependency Treebank is a multi-level corpus of Modern Standard Arabic in the form of dependency analytical trees. More than 113,500 tokens are analysed and provided with disambiguated morphological information. Complete annotation of MorphoTrees for more than 148,000 tokens is also available (analytical processing for 49,000).
Language(s) : Modern Standard Arabic

Click here for
more information

ELRA-U-W 0090

ATR Dependency Corpus

The ATR corpus is a treebank of 6,553 sentences of Japanese conversations in the field of hotel reservations.
Language(s) : Japanese (Japan)

Click here for
more information

ELRA-U-W 0091

TREPIL Norwegian Treebank

This resource is developed within the framework of the TREPIL project (2004-2008). The aim of the project is the semi-automatic construction of a Norwegian treebank.
Language(s) : Norwegian (Norway)

Click here for
more information

Displaying 81 to 100 (of 730 products)

Result Pages: [<< Prev] 1 2 3 4 5 ... [Next >>]