Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 441 to 460 (of 730 products) Result Pages: [<< Prev]  ... 21  22  23  24  25 ...  [Next >>] 

ELRA-WC0140
CROSSMARC Corpus 


The purpose of the project was the development of a corpus, cross-lingual name matching, multi-lingual named entity recognition.
Language(s) : English - Greek - French - Italian

Click here for
more information


ELRA-WC0141
PDF Corpus 


It consists of 101 PDF documents (181,748 words) with a great variety in their content, appearance, style, and structure.
Language(s) : English

Click here for
more information


ELRA-WC0142
WCL Generic Corpus 


It consists of 5.500 words of newspaper articles, paragraphs of literature and sentences constructed and annotated by a professional linguist.
Language(s) : Greek

Click here for
more information


ELRA-WC0143
Swedish-English Corpus 


The parallel corpus, of approximately 50.000 tokens, comprises agricultural reports, specifications and circulars produced within the European Union.
Language(s) : Swedish - English

Click here for
more information


ELRA-WC0144
La Repubblica corpus 


It contains newspaper texts, amounting to 175 million words.
Language(s) : Italian

Click here for
more information


ELRA-WC0145
Diachronic Italian corpus of nuclear physics texts 


This is a diachronic Italian corpus, divided into three subcorpora, of nuclear physics texts belonging to different genres.
Language(s) : Italian

Click here for
more information


ELRA-WC0146
Diachronic English corpus of nuclear physics articles 


This corpus contains the abstracts of newspapers and is structured
into diachronic stages of around ten years.
Language(s) : English

Click here for
more information


ELRA-WC0147
Prague Czech-English Dependency Treebank (PCEDT) 


This is a parallel treebank and a Czech-English syntactically annotated resource.
Language(s) : Czech - English

Click here for
more information


ELRA-WC0148
Czech National Corpus (CNC) 


This corpus is composed of computer-based texts of over 400 million words.
Language(s) : Czech

Click here for
more information


ELRA-WC0149
English corpus of biotechnology business information 


It contains 840 documents of 452,000 words in total from newspapers related to biotechnology business information.
Language(s) : English

Click here for
more information


ELRA-WC0150
The DELOS Corpus 


The Delos corpus is a collection of economic domain texts of approximately five million words and of varying genre (press reportage, news, articles, interviews and scientific studies). It has been automatically annotated.
Language(s) : Greek

Click here for
more information


ELRA-WC0151
AAC - Austrian Academy Corpus 


The AAC is a very large and complex electronic text collection.
Language(s) : German - English - Russian

Click here for
more information


ELRA-WC0154
Penn Discourse Treebank (PDTB) 


This treebank aims to produce a large-scale corpus in which approximately 30,000 discourse connectives are annotated.
Language(s) : English

Click here for
more information


ELRA-WC0155
TIGER Treebank 


It consists of approximately 700,000 tokens (40,000 sentences) of semi-automatically tagged German newspaper text.
Language(s) : German

Click here for
more information


ELRA-WC0156
Tübingen Treebank of Written German (TüBa-D/Z)


The TüBa-D/Z treebank contains 45,200 sentences (794,079 tokens) taken from a German newspaper corpus (data based on 'die tageszeitung' from taz). The syntactic annotation was performed manually.
Language(s) : German

Click here for
more information


ELRA-WC0157
The GNOME Corpus 


The GNOME Corpus includes texts from three genres - museum labels, pharmaceutical leaflets, and tutorial dialogues - in which different types of discourse and semantic information have been annotated.
Language(s) : English

Click here for
more information


ELRA-WC0158
The Reuters Corpus 


This corpus includes over 800,000 English language news stories.
Language(s) : French - English

Click here for
more information


ELRA-WC0159
The MuchMore bilingual medical corpus 


It includes around 9000 scientific abstracts in various domains with around 1 million tokens for each language.
Language(s) : English - German

Click here for
more information


ELRA-WC0160
Two Variant Corpora 


The corpora consist of 324,616 Korean sentences, half translated from Japanese and the other half from English that match the original Japanese.
Language(s) : Korean - Japanese - English

Click here for
more information


ELRA-WC0161
Reuters-21578 


It consists of 21,578 news appeared on the Reuters newswire in 1987.
Language(s) : English

Click here for
more information


Displaying 441 to 460 (of 730 products) Result Pages: [<< Prev]  ... 21  22  23  24  25 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4