Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 61 to 80 (of 730 products) Result Pages: [<< Prev]   1  2  3  4  5 ...  [Next >>] 

ELRA-U-W 0052
Lexesp Corpus 


Lexesp is a Spanish balanced corpus of 6,000,000 words wich was published in 2000. It represents various written categories: different literary genres, newspaper articles, scientific texts.
Language(s) : Spanish (Spain)

Click here for
more information


ELRA-U-W 0053
Pour la science SMS Corpus 


This French corpus contains 30,000 SMS (Short Message Service) which have been collected in Belgium within the framework of the project 'Give your SMS to science' ('Faites don de vos SMS à la science').
Language(s) : French (Belgium)

Click here for
more information


ELRA-U-W 0054
Carmel Corpus 


It is a multilingual aligned corpus of literary texts in four languages: English, French, Italian, Spanish. It contains 10,000,000 words from 36 classics of travel story from 19th to early 20th century.
Language(s) : French (France)English (United Kingdom) - French (France)Italian (Italy) - French (France)Spanish (Spain)

Click here for
more information


ELRA-U-W 0055
IJS-ELAN Slovene-English Parallel Corpus (IJS-ELAN)


This Slovene-English parallel corpus is composed of 15 texts and contains 500,000 words per language. It is tokenised, sentence segmented and aligned (encoding : XML (TEI/P4)).
Language(s) : Slovenian (Slovenia)English (United Kingdom)

Click here for
more information


ELRA-U-W 0056
Czech-English Parallel Corpus (CzEng)


This Czech-English parallel corpus contains approximately 90 million words per language. It was compiled between 2005 and 2009 with documents from various fields: European law, information technologies and fiction. In the last version (0.9) some texts from parallel web pages, electronically available books and subtitles have been added.
Language(s) : Czech (Czech Republic)English (United Kingdom)

Click here for
more information


ELRA-U-W 0057
The Croco Corpus (German-English Parallel Corpus) 


This is a German-English parallel corpus of one million words.
Texts are comparable in terms of registers (8) ; both translation directions are represented for each register.
Language(s) : German (Germany)English (United Kingdom)

Click here for
more information


ELRA-U-W 0058
The IPI-PAN Corpus 


The IPI-PAN corpus is a Polish written corpus of more than 250 million segments. Various genres are represented (in unbalanced proportions): contemporary prose, older prose, science, newspapers, parliamentary proceedings, law. The corpus is morphosyntactically annotated.
Language(s) : Polish (Poland)

Click here for
more information


ELRA-U-W 0059
Enron Email Corpus 


This American English database contains 500,000 emails presented in folders, from 158 users in charge of senior management at Enron.
Language(s) : English (USA)

Click here for
more information


ELRA-U-W 0060
Corpus of Spoken Professional American English (CSPA)


This American English corpus contains transcripts of professional conversations which were recorded between 1994 and 1998. It gathers 2 million words from 400 speakers.
Language(s) : English (USA)

Click here for
more information


ELRA-U-W 0061
AnCora-DEP-CAT Catalan Treebank 


AnCora-DEP-CAT is a Catalan corpus of 478,876 words (still under development). The 16,633 sentences of the corpus have been annotated with dependencies.
Language(s) : Catalan (Spain)

Click here for
more information


ELRA-U-W 0062
AnCora-DEP-ESP Spanish Treebank 


AnCora-DEP-ESP is a Spanish corpus of 95,028 words (still under development). The 3,512 sentences of the corpus have been annotated with dependencies.
Language(s) : Spanish (Spain)

Click here for
more information


ELRA-U-W 0063
The Triptic Corpus (English, French and Dutch) 


The Triptic corpus is a trilingual parallel corpus for English, French and Dutch. It contains 2,000,000 words and is aligned at paragraph level. It can be divided in two parts: fiction and non fiction.
Language(s) : English (United Kingdom)Dutch - DutchFrench - English (United Kingdom)French

Click here for
more information


ELRA-U-W 0064
Chinese-English Parallel Corpus 


This corpus, which is still under construction, is a Chinese-English parallel corpus that will amount to 17 million words in each language when completed. It gathers texts from various genres: newspapers, technical articles, literature, movie transcription etc.
Language(s) : Chinese (China)English (United Kingdom)

Click here for
more information


ELRA-U-W 0065
The UCLA Chinese corpus 


It is a modern Chinese written corpus of one million tokens from texts collected between 2000 and 2005. It is segmented and POS tagged.

It can be considered as a recent update of the Lancaster Corpus of Mandarin Chinese (LCMC), available from the ELRA catalogue under the reference ELRA-W0039.
Language(s) : Chinese (China)

Click here for
more information


ELRA-U-W 0066
Tagged Corpus of Spoken Professional American English 


This is the tagged version of the Corpus of Spoken Professional American English, which contains transcripts of professional conversations which were recorded between 1994 and 1998 (2 million words from 400 speakers).
Language(s) : English (USA)

Click here for
more information


ELRA-U-W 0067
PWN Corpus of the Polish Language 


This Polish corpus consists of texts from books, periodicals, web sites, ephemera and transcripts of spoken texts. It is a balanced corpus of 70 million words. Unfortunately, for copyright reasons, the only data available is a sampler of 7,5 million words (the demonstration version of the online corpus).
Language(s) : Polish (Poland)

Click here for
more information


ELRA-U-W 0068
Christine Corpus 


It is a treebank of 100,000 words that covers mostly spontaneous, informal spoken English of the 90's. It offers structural analyses of a cross-section of speech from all British regions, social classes, etc.
Language(s) : English (United Kingdom)

Click here for
more information


ELRA-U-W 0069
Lucy Corpus 


It represents written English in modern Britain (from published prose to the less-skilled writing of young adults and nine-to-twelve-year-old children). It is a 165,000 'word' treebank (compound words do not count as a single word) that was compiled between 2000 and 2003.
Language(s) : English (United Kingdom)

Click here for
more information


ELRA-U-W 0070
The International Corpus of Learner English (ICLE)


This corpus of argumentative essay writing contains over 3.7 million words written by advanced learners of English from 19 different mother tongue backgrounds. It is divided according to the mother tongue (the target size for each sub-corpus is 200,000 words).
Language(s) : English

Click here for
more information


ELRA-U-W 0071
Louvain Corpus of Native English Essays (LOCNESS)


This corpus contains 324,304 words from native English essays. It is divided in three parts: British pupils' A level essays, British university students essays, American university students' essays.
Language(s) : English (United Kingdom) - English (USA)

Click here for
more information


Displaying 61 to 80 (of 730 products) Result Pages: [<< Prev]   1  2  3  4  5 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4