Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 81 to 100 (of 730 products) Result Pages: [<< Prev]   1  2  3  4  5 ...  [Next >>] 

ELRA-U-W 0072
French Interlanguage Database (FRIDA)


This resource is a written corpus of French as a foreign language (size: 450,000 words). It is an error-tagged learner corpus.
Language(s) : French

Click here for
more information


ELRA-U-W 0073
EUS3LB Basque Dependency Treebank 


EUS3LB is a Basque treebank of 50,000 words. The annotation concerns : POS for morphosyntactic information, constituents and functions for syntactic information.
Language(s) : Basque (Spain)

Click here for
more information


ELRA-U-W 0074
Modern Hebrew Treebank (MHT)


The Modern Hebrew Treebank contains 6,500 sentences of news items from the Ha'aretz daily newspaper. It is segmented and analysed at morpho-syntactic level.
Language(s) : Hebrew (Israel)

Click here for
more information


ELRA-U-W 0075
The Parc 700 Dependency Bank 


This corpus is composed of 700 sentences randomly extracted from the Wall Street Journal treebank. It has been parsed with an LFG grammar, and annotated with grammatical dependency relations.
Language(s) : English (USA)

Click here for
more information


ELRA-U-W 0076
METU-Sabanci Turkish Treebank 


METU-Sabanci Turkish Treebank contains 7262 sentences taken from METU Turkish Corpus. It is annotated at morpho-syntactic level.
Language(s) : Turkish (Turkey)

Click here for
more information


ELRA-U-W 0077
The Sofie Treebank 


This is a parallel treebank of North European languages: Danish, Dutch, English, Estonian, Finnish, German, Icelandic, Norwegian and Swedish.
The data is taken from the Norwegian original and the translations of the first two chapters of Jostein Gaarder's novel 'Sofies verden'.
Language(s) : Danish (Denmark) - Dutch (Netherlands) - English (United Kingdom) - Estonian (Estonia) - Finnish (Finland) - German (Germany) - Icelandic - Norwegian (Norway) - Swedish (Sweden)

Click here for
more information


ELRA-U-W 0078
Greek Dependency Treebank (GDT)


This is a modern Greek corpus annotated at multiple levels. The data comes from transcripts of European parliamentary sessions and web documents (health, travel, politics).
Language(s) : Modern Greek (Greece)

Click here for
more information


ELRA-U-W 0079
Penn Treebank (PTB)


The Penn Treebank is a bank of linguistic trees for English. The data comes from several well-known corpora: Wall Street Journal, the Brown Corpus, Switchboard and ATIS (more than one million words). The corpus contains annotations showing rough syntactic and semantic information.
Language(s) : English (USA)

Click here for
more information


ELRA-U-W 0080
Academia Sinica Balanced Corpus of Modern Chinese (Sinica)


Sinica 5.0 contains 10 million words from various topics: philosophy, science, society, art, life, literature.
Texts are segmented and POS tagged.
Language(s) : Chinese

Click here for
more information


ELRA-U-W 0081
Sinica Treebank 


Sinica Treebank v3.0 was released in 2000 with texts taken from the Sinica Corpus. It contains 361,834 words and 61,087 trees.
Language(s) : Chinese

Click here for
more information


ELRA-U-W 0082
Tübingen Treebank of Spoken German (TüBa-D/S)


This resource is a spoken German corpus that was annotated in the project Verbmobil. It contains 360,000 words and 38,000 sentences.
Language(s) : German (Germany)

Click here for
more information


ELRA-U-W 0083
Tübingen Treebank of Spoken English (TüBa-E/S)


The Tübingen Treebank of Spoken English is composed of 30,000 sentences (ca. 310,000 words) of spontaneous dialogues which were manually transliterated. The syntactic annotation was performed manually.
It was annotated in the project Verbmobil.
Language(s) : English

Click here for
more information


ELRA-U-W 0084
Tübingen Treebank of Spoken Japanese (TüBa-J/S)


The Tübingen Treebank of Spoken Japanese is composed of 18,000 sentences (ca. 160,000 words) of spontaneous dialogues which were manually transliterated. The syntactic annotation was performed manually.
It was annotated in the project Verbmobil.
Language(s) : Japanese

Click here for
more information


ELRA-U-W 0085
Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z)


The Tübingen Partially Parsed Corpus of Written German is composed of articles from 'die Tageszeitung' (taz newspaper). The data comprises more than 200 million word tokens and has been automatically annotated (POS, morphological ambiguity classes, clause structure, topological fields and chunks).
Language(s) : German (Germany)

Click here for
more information


ELRA-U-W 0086
Danish Dependency Treebank 


The Danish Dependency Treebank was built on top of the Danish PAROLE corpus. It consists of 474 texts containing 5,540 sentences and 100,200 words.
Language(s) : Danish (Denmark)

Click here for
more information


ELRA-U-W 0087
Estonian Treebank Arborest (Arborest)


Arborest is a 2,500 sentence treebank of Estonian which was built in a two-stage process using both Constraint Grammar (CG) and Phrase Structure Grammar (PSG).
Language(s) : Estonian

Click here for
more information


ELRA-U-W 0088
Penn Arabic Treebank (ATB)


The Penn Arabic Treebank is a one million word corpus that has been syntactically annotated.
Language(s) : Modern Standard Arabic

Click here for
more information


ELRA-U-W 0089
Prague Arabic Dependency Treebank (PADT)


The Prague Arabic Dependency Treebank is a multi-level corpus of Modern Standard Arabic in the form of dependency analytical trees. More than 113,500 tokens are analysed and provided with disambiguated morphological information. Complete annotation of MorphoTrees for more than 148,000 tokens is also available (analytical processing for 49,000).
Language(s) : Modern Standard Arabic

Click here for
more information


ELRA-U-W 0090
ATR Dependency Corpus 


The ATR corpus is a treebank of 6,553 sentences of Japanese conversations in the field of hotel reservations.
Language(s) : Japanese (Japan)

Click here for
more information


ELRA-U-W 0091
TREPIL Norwegian Treebank 


This resource is developed within the framework of the TREPIL project (2004-2008). The aim of the project is the semi-automatic construction of a Norwegian treebank.
Language(s) : Norwegian (Norway)

Click here for
more information


Displaying 81 to 100 (of 730 products) Result Pages: [<< Prev]   1  2  3  4  5 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4