Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 521 to 540 (of 730 products) Result Pages: [<< Prev]  ... 26  27  28  29  30 ...  [Next >>] 

ELRA-WC0227
IBM Manuals Treebank 


This is a 800,000-word skeleton-parsed corpus of computer manuals.
Language(s) : English

Click here for
more information


ELRA-WC0228
Lancaster Anaphoric Treebank 


A subsample of the Associated Press Corpus, containing American newswire reports, annotated to show the reference of pronouns and lexical cohesion. It contains approximately 100,000 words.
Language(s) : English (USA)

Click here for
more information


ELRA-WC0229
Longman-Lancaster Corpus 


It consists of 30 million words of written English taken from literature, magazines, papers and more ephemeral materials such as leaflets and packaging.
Language(s) : English

Click here for
more information


ELRA-WC0230
ET10-63 Corpus 


It consists of approximately 1,250,000 words of each language and contains EC offical documents on telecommunications. The corpus is part-of-speech tagged and lemmatized.
Language(s) : English - French

Click here for
more information


ELRA-WC0231
CRATER Corpus (Available since 23/01/1997)


This is an 1,000,000-word trilingual corpus aligned at the sentence level. It is made up of texts from the telecommunications domain. It has been part-of-speech tagged in all three languages.
Language(s) : English - French - Spanish

Click here for
more information


ELRA-WC0232
The Helsinki Corpus of English Texts: Diachronic Part 


It contains samples from texts covering the Old, Middle, and Early Modern English periods. It consists of 1,500,000 words in total.
Language(s) : English

Click here for
more information


ELRA-WC0233
Lampeter Corpus of Early Modern English Tracts 


It consists of approximately one million words of English pamphlet literature covering the years 1640-1740. It is being tagged for part-of-speech and lemmatized.
Language(s) : English

Click here for
more information


ELRA-WC0234
Corpus BAF 


It consists of approximately 400,000 words for each language from literary, technical, scientifc and instutitional domains.
Language(s) : English - French

Click here for
more information


ELRA-WC0235
Corpus of transcribed Bulgarian conversations 


It consists of transcribed conversations in family contexts.
Language(s) : Bulgarian

Click here for
more information


ELRA-WC0236
Transcripts of Bulgarian Parliament debates 


It consists of transcripted recordings of broadcasts from the debates of the 7th Great National Assembly on 31 October, 1990.
Language(s) : Bulgarian

Click here for
more information


ELRA-WC0237
Orchid Corpus 


This Part-of-Speech tagged corpus is a Thai text corpus with syntactic word class annotation. It contains approximately 2,560,000 words.
Language(s) : Thai

Click here for
more information


ELRA-WC0238
Annotated Bulgarian Texts 


It contains three texts annotated with prevalently lexical notes.
Language(s) : Bulgarian

Click here for
more information


ELRA-WC0239
Bulgarian Poetry Archive 


This electronic literary text archive contains 301 poems in HTML format.
Language(s) : Bulgarian

Click here for
more information


ELRA-WC0240
Bulgarian Text Corpus (TRACTOR) 


It consists of approximately 275 000 words and represents text genres such as news, legal and poetry, encoded with SGML according to the Corpus Encoding Standard (CES).
Language(s) : Bulgarian

Click here for
more information


ELRA-WC0241
Corpus Textual Informatitzat de la Llengua Catalana (CTILC) 


It contains various kinds of texts, dating from 1833 to 1988: literary (theatre, poetry…), and no literary texts (legal documents, scientific articles, etc.).
Language(s) : Catalan

Click here for
more information


ELRA-WC0242
Chinese Philosophical E-text Archive 


It consists of chinese e-texts ranging from the classical pre-Qin and Song to the Qing and modern ones: electronic versions of Chinese philosophical texts created by the Confucian Etext Project, electronic versions of Chinese philosophical texts from other sources and information on and links to more information on the preparation and use of these texts.
Language(s) : English

Click here for
more information


ELRA-WC0243
Penn Chinese Treebank 


This is a corpus of Chinese text segmented into words and annotated with part-of-speech labels and syntactic bracketing, modeled on the English TreeBank. It contains 500 thousand words (over 824K Chinese characters).
Language(s) : English

Click here for
more information


ELRA-WC0244
Chinese Proposition Bank 


The goal of the Penn Chinese Proposition Bank project is to create a corpus of text annotated with information about basic semantic propositions. Predicate-argument relations are being added to the syntactic trees of the Penn Chinese Treebank.
Language(s) : English

Click here for
more information


ELRA-WC0245
Chiricahua and Mescalero Apache Texts 


It contains 56 Apache texts and their English translation, collected in the summers of 1930 and 1931, and in the spring of 1934.
Language(s) : English

Click here for
more information


ELRA-WC0246
Trilingual Parallel Computer Corpus 


This corpus has been compiled in order to study the syntactic and semantic structure of "up/on" and its equivalents in French ("sur") and Dutch ("op"). It consists of 2 million words and it is divided into two subcorpora (fiction and non-fiction).
Language(s) : Dutch - English - French

Click here for
more information


Displaying 521 to 540 (of 730 products) Result Pages: [<< Prev]  ... 26  27  28  29  30 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4