Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 381 to 400 (of 730 products) Result Pages: [<< Prev]  ... 16  17  18  19  20 ...  [Next >>] 

ELRA-U-W0374
Chinese Named Entity Tagged Corpus (CNEC1.0)


CNEC 1.0 is a corpus in Mandarin Chinese, annotated with named entity tags for person name, location name, organization name, location as organization and organization as location.
Language(s) : Chinese

Click here for
more information


ELRA-U-W0375
Coreference Corpus for Dutch 


This is a Dutch corpus annotated for coreference including annotation for:
- identity relations between noun phrases,
- bound relations where an anaphor refers to a quantified antecedent,
- bridge relations when the anaphor is a subset of the antecedent,
- predicative relations, indicating extra information about the referent.
Language(s) : Dutch

Click here for
more information


ELRA-U-W0376
Corpus for cross-document co-reference 


This is a corpus of 257,000 words annotated for co-reference. It contains 18,000 distinct document-level entities and approximately 55,000 entity mentions.
Language(s) : English (USA)

Click here for
more information


ELRA-U-W0377
Greek corpus for Textual Entailment (GTEC)


This corpus contains 600 pairs of textual units manually annotated for textual entailment.
Language(s) : Greek

Click here for
more information


ELRA-U-W0378
Corpus of Serbian Language (CSL)


It was compiled from a sample of 11 million words from the 12th century to nowadays. The corpus has been manually tagged for grammatical category, number of graphemes, number of syllables and phonological structure.
Language(s) : Serbian

Click here for
more information


ELRA-U-W0379
Bulgarian Polish Lithuanian corpus (BG-PL-LT corpus)


This is a trilingual corpus of 3 million words in Bulgarian, Polish and Lithuanian. The BG-PL-LT corpus includes a parallel and a comparable corpus.
Language(s) : Bulgarian - Polish - Lithuanian

Click here for
more information


ELRA-U-W0380
Bulgarian National Corpus 


The Bulgarian National Corpus consists of about 320,000,000 words from more than 10,000 texts. This corpus reflects the state of the Bulgarian language (mainly in its written form) from the middle of XX century (1945) until present.
Language(s) : Bulgarian

Click here for
more information


ELRA-U-W0381
INTERREG corpus 


This is a parallel corpus which consists of web documents from the general domain. It contains 4,190,000 tokens in Greek, 3,430,000 tokens in Bulgarian and 3,900,000 tokens in English.
Language(s) : Greek - Bulgarian - English

Click here for
more information


ELRA-U-W0382
British English 2006 (BE06)


This corpus contains 1,010,996 words of written British English published between 2003 and 2008. A large part of the texts (82%) were published between 2005 and 2007.
Language(s) : English (United Kingdom)

Click here for
more information


ELRA-U-W0383
FALKO corpus of learner German 


This corpus contains texts produced by learners of German (from 49 different native languages). The most represented learners are from Danemark, England, France, Poland and Russia.
Language(s) : German

Click here for
more information


ELRA-U-W0384
Intercorp 


This is a multilingual corpus which contains 44 million of words. It consists of texts of fiction semi-automatically aligned between the Czech version and one of the following languages: English, French, German, Russian and Spanish.
Language(s) : Czech >>>> English - Czech >>>> French - Czech >>>> German - Czech >>>> Russian - Czech >>>> Spanish

Click here for
more information


ELRA-U-W0385
Thai Annotated Corpus for Text Summarization and Question Answering 


This is a Thai corpus which contains 500 articles of about 120 words per document. It was segmented into 8,000 Elementary Discourse Units (EDUs) and annotated for Text Summarization and Question Answering.
Language(s) : Thai

Click here for
more information


ELRA-U-W0386
Japanese Textbook Corpus 


It contains 1,478 samples of text extracted from 127 textbooks (about 1,000,000 characters). It consists of digitized parts of textbooks from elementary schools, junior high schools and high schools (grade 1 to 13) in Japan.
Language(s) : Japanese

Click here for
more information


ELRA-U-W0387
Bio-Event Linguistically Annotated Corpus (BELA)


It contains 677 abstracts annotated for gene regulation events, described both by verbs and nominalised verbs. The corpus consists of MEDLINE abstracts on the subject of E. coli.
Language(s) : English

Click here for
more information


ELRA-U-W0388
Gene Regulation Event Corpus (GREC)


It contains 240 MEDLINE abstracts annotated for events relating to gene regulation and expression.
Language(s) : English

Click here for
more information


ELRA-U-W0389
ERREUS corpus of student learner 


This is a corpus of texts written by Basque language learners and native speakers. It is annotated for errors and POS.
Language(s) : Basque

Click here for
more information


ELRA-U-W0390
Sentiment-annotated quotation corpus 


This corpus contains a set of English language quotations manually annotated for sentiment (positive, negative, objective/neutral) expressed towards the entities mentioned inside the quotation.
Language(s) : English

Click here for
more information


ELRA-U-W0391
KNB Corpus 


This is a Japanese blog corpus. It contains 249 articles and 4,186 sentences. It is fully annotated (part-of-speech, syntax, case, ellipsis, opinion information and review information).
Language(s) : Japanese

Click here for
more information


ELRA-U-W0392
Maltese National Corpus (MNC)


This is a Maltese corpus which contains legal, news, literature and web texts. It is annotated in the XML format.
Language(s) : Maltese

Click here for
more information


ELRA-U-W0393
Homepage corpus 


This corpus contains homepages of researchers manually annotated for social information. It is in html format.
Language(s) : Hungarian

Click here for
more information


Displaying 381 to 400 (of 730 products) Result Pages: [<< Prev]  ... 16  17  18  19  20 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4