Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 281 to 300 (of 730 products) Result Pages: [<< Prev]  ... 11  12  13  14  15 ...  [Next >>] 

ELRA-U-W 0273
A Representative Corpus of Historical English Registers (ARCHER)


The ARCHER is a socio-historical corpus made up of texts representing eleven written and spoken registers in British and American English. It is divided into ten 50-year periods from 1650-1990 and contains approximately 1,7 million words.
Language(s) : English (USA) - English (United Kingdom)

Click here for
more information


ELRA-U-W 0274
Romanian Corpus of Newspaper Texts 


This Romanian newspaper corpus contains 56 million words with diacritics (Unicode .txt format), it has been parsed with GojolParser (two formats : dependency maps and trees) with a good accuracy.
Language(s) : Romanian

Click here for
more information


ELRA-U-W 0275
Melbourne-Surrey Corpus 


The Melbourne-Surrey corpus contains 100,000 words of Australian newspaper texts.
Language(s) : English (Australia)

Click here for
more information


ELRA-U-W0276
NPS Chat Corpus 


The NPS Chat Corpus contains 10,567 posts collected in 2006 from various online chat services. Posts have been hand privacy masked, part-of-speech tagged and dialogue-act tagged.
Language(s) : English (USA)

Click here for
more information


ELRA-U-W0277
Problem Report Corpus 


The Problem Report Corpus contains problem report summaries from various open source projects (Apache, Eclipse, Firefox, Linux, Openoffice).
Language(s) : English

Click here for
more information


ELRA-U-W0278
The Patient Information Leaflet Corpus (PIL)


This is a collection of 471 documents giving instructions to patients about their medication.
Language(s) : English

Click here for
more information


ELRA-U-W0279
Sentiment polarity datasets 


This is a collection of movie-review documents labeled with respect to their overall sentiment polarity (positive or negative). It contains 1000 positive and 1000 negative full text movie reviews.
Language(s) : English

Click here for
more information


ELRA-U-W0280
Subjectivity datasets 


This is a collection of sentences labeled with respect to their subjectivity status (subjective or objective). It contains 5000 subjective and 5000 objective processed sentences.
Language(s) : English

Click here for
more information


ELRA-U-W0281
Corpus del Español 


It contains more than 100 million words in more than 20,000 Spanish texts from the 1200s to the 1900s.
Language(s) : Spanish

Click here for
more information


ELRA-U-W0282
Italian TimeBank (ITB)


The Italian TimeBank (ITB) contains 171 newspaper articles which have been manually annotated for events. It represents a total of 62.000 words.
Language(s) : Italian

Click here for
more information


ELRA-U-W0283
SUBTLEXus Corpus 


This is a subtitle corpus for American English. It contains 51 million words from 8,388 US films and sitcoms (from 1900 to 2007).
Language(s) : English (USA)

Click here for
more information


ELRA-U-W0284
KRYS I Corpus 


This is a corpus containing about 6300 PDF documents classified into genres. Documents have been labelled independently by two kinds of people according to 70 assigned genres.
Language(s) : English

Click here for
more information


ELRA-U-W0285
New Amsterdam Corpus of Old French Literary Texts (NCA)


This is a lemmatized and XML-formatted corpus of Old French Literary Texts containing more than three millions words from 200 different texts.
Language(s) : French

Click here for
more information


ELRA-U-W0286
The Dialogue Diversity Corpus (DDC)


This is a written corpus containing 54 dialogues transcripts, collected from different corpora.
Language(s) : English

Click here for
more information


ELRA-U-W0287
Parallel Italian-Danish Corpus annotated for anaphora 


The Parallel Italian-Danish Corpus annotated for anaphora contains EU texts, literary texts, newspaper articles and dialogue transcriptions.
Language(s) : Italian - Danish

Click here for
more information


ELRA-U-W0288
NICT Japanese-Chinese parallel corpus 


This is a parallel corpus containing 38,383 sentence pairs collected in Japanese newspapers and translated into Chinese. This corpus is aligned at word and phrase levels and has been annotated with morphological and syntactic tags.
Language(s) : Japanese <<< >>> Chinese

Click here for
more information


ELRA-U-W0289
HMIHY corpus 


The "How May I Help You (SM)?" corpus (or HMIHY corpus) contains 5,690 human-computer dialogues. Each caller turn is annotated with emotional labels.
Language(s) : English

Click here for
more information


ELRA-U-W0290
BLOGS08 


BLOGS08 is a TREC test collection containing samples of the blogosphere collected once a week during one year. It represents 28,488,767 blog posts from 1,303,520 blog feeds.
Language(s) : English

Click here for
more information


ELRA-U-W0291
Help-desk emails dialogues Corpus 


This is a large corpus containing 30,000 request–response email dialogues between customers and operators.
Language(s) : English

Click here for
more information


ELRA-U-W0292
SYN2005 corpus 


The SYN2005 corpus is a synchronic representative corpus of contemporary written Czech collected between 2000 and 2004. It contains 100 million words (tokens), lemmatised and Part-Of-Speech tagged.
Language(s) : Czech

Click here for
more information


Displaying 281 to 300 (of 730 products) Result Pages: [<< Prev]  ... 11  12  13  14  15 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4