Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 321 to 340 (of 730 products) Result Pages: [<< Prev]  ... 16  17  18  19  20 ...  [Next >>] 

ELRA-U-W0313
Hebrew Dependency Treebank 


The Dependency Treebank of Hebrew consists of 6220 sentences, fully dependency parsed.
Language(s) : Hebrew

Click here for
more information


ELRA-U-W0314
Quechua-Spanish Parallel Treebank 


This is a corpus-based parallel treebank of 200 sentences in both languages Quechua and Spanish.
Language(s) : Spanish (Peru) <<< >>> Quechua (Peru)

Click here for
more information


ELRA-U-W0315
Romanian Dependency Treebank (RDT)


The data contained in this treebank is representative of modern written standard Romanian. This resource is morpho-syntactically tagged.

It consists of 36,150 tokens
Language(s) : Romanian (Romania)

Click here for
more information


ELRA-U-W0316
Berlin central station Corpus 


This is an English corpus of 1,068 web pages related to the "Berlin central station". It contains 55,255 sentences annotated for Name Entities (NE).
Language(s) : English

Click here for
more information


ELRA-U-W0317
NP4E corpus 


This is a corpus of newswire texts coreferentially annotated for noun phrase (NP) coreference on 55,000 words and for event coreference on 12,500 words.
Language(s) : English

Click here for
more information


ELRA-U-W0318
Persian Linguistic Database (PLDB)


It contains various corpora in Modern Persian (Farsi), annotated for Part-of-Speech and/or pronunciation.
Language(s) : Persian

Click here for
more information


ELRA-U-W0319
UN Corpora 


It contains paragraph-aligned parallel corpora in the six official languages of the United Nations. It represents a total of around 3 million tokens per language.
Language(s) : English - French - Arabic - Chinese - Russian - Spanish

Click here for
more information


ELRA-U-W0320
PolyU Business Corpus (PUBC)


The PolyU Business Corpus contains 3 comparable corpora of business texts in English, Chinese and Japanese.

It consists of news and reports from the business and finance sections of newspapers, annual reports and press releases from companies, online versions of company brochures and leaflets, ...
Language(s) : Japanese (Japan) - Chinese (Hong Kong) - English (Hong Kong)

Click here for
more information


ELRA-U-W0321
SFU Review Corpus 


This is a collection of review documents labeled with respect to their overall sentiment polarity (positive or negative). It contains 400 reviews in English and 400 in Spanish.
Language(s) : English - Spanish

Click here for
more information


ELRA-U-W0322
CLiPA corpus 


This corpus contains 5 original texts in English, with plagiarised versions of them in English, Spanish and Italian.
Language(s) : English - Spanish - Italian

Click here for
more information


ELRA-U-W0323
BioInfer Corpus 


This is an annotated corpus of biomedical English containing 1100 sentences. It consists of biomedical research articles' abstracts annotated for relationships, named entities, and syntactic dependencies.
Language(s) : English

Click here for
more information


ELRA-U-W0324
Greek biomedical corpus 


The Greek biomedical corpus contains 11.5 million word-forms from periodical articles and conference papers in modern Greek.

Annotation includes structural data, morphosyntactic and semantic tagging, biomedical words and multi-word terms identification. The corpus is annotated in the XML format, following TEI guidelines.
Language(s) : Greek (Greece)

Click here for
more information


ELRA-U-W0325
Death Penalty Corpus (DP Corpus)


It contains 1152 documents in English collected from pro-death penalty and anti-death penalty websites. Documents are annotated for sentiment and for document's general tone.
Language(s) : English

Click here for
more information


ELRA-U-W0326
Corpus annotated for multiword nouns 


This is a French corpus annotated for multiword nouns. It contains 166,000 words (8,600 sentences) from which 5,057 occurrences of multiword nouns have been annotated.
Language(s) : French

Click here for
more information


ELRA-U-W0327
Tagged corpus for Galician language 


This is a POS tagged corpus in Galician which contains 309,505 gramatical elements extracted from newspapers and journals.
Language(s) : Galician

Click here for
more information


ELRA-U-W0328
PsyCoL Maltese Lexical Corpus (PMLC)


This is a text database of 3,323,325 tokens (53,396 unique tokens) in Maltese collected from on-line newspapers.
Language(s) : Maltese

Click here for
more information


ELRA-U-W0329
PsyCoL Hebrew Lexical Corpus (PHLC)


This is a text database of 60,052,261 tokens (396,469 unique tokens) in Hebrew collected from on-line newspapers, TV transcripts and medical forums.
Language(s) : Hebrew

Click here for
more information


ELRA-U-W0330
KFOR Text Corpus 


This corpus contains 800 military reports in English (886,000 tokens) from the KFOR activities of the German Federal Army. Annotation includes Part-of-speech, Named Entities (NE), structural parts of the document (topic, source, ...) and verbal groups in different layers of annotation.
Language(s) : English (Germany)

Click here for
more information


ELRA-U-W0331
TR-CoNLL 


The TR-CoNLL corpus contains 946 news articles (204,566 tokens) from the CoNLL shared task, in which 6,980 toponym instances have been annotated.

Toponym Resolution (TR) is the task of mapping from a set of potentially ambiguous place names to the intended latitude/longitude coordinates of places they refer to, taking into account textual context.
Language(s) : English

Click here for
more information


ELRA-U-W0332
CSpace Email corpus 


This is a corpus of around 15,000 email messages.
Language(s) : English

Click here for
more information


Displaying 321 to 340 (of 730 products) Result Pages: [<< Prev]  ... 16  17  18  19  20 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4