Universal Catalogue

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Written Corpora

Displaying 381 to 400 (of 730 products)

Result Pages: [<< Prev] ... 16 17 18 19 20 ... [Next >>]

ELRA-U-W0374

Chinese Named Entity Tagged Corpus (CNEC1.0)

CNEC 1.0 is a corpus in Mandarin Chinese, annotated with named entity tags for person name, location name, organization name, location as organization and organization as location.
Language(s) : Chinese

Click here for
more information

ELRA-U-W0375

Coreference Corpus for Dutch

This is a Dutch corpus annotated for coreference including annotation for:
- identity relations between noun phrases,
- bound relations where an anaphor refers to a quantified antecedent,
- bridge relations when the anaphor is a subset of the antecedent,
- predicative relations, indicating extra information about the referent.
Language(s) : Dutch

Click here for
more information

ELRA-U-W0376

Corpus for cross-document co-reference

This is a corpus of 257,000 words annotated for co-reference. It contains 18,000 distinct document-level entities and approximately 55,000 entity mentions.
Language(s) : English (USA)

Click here for
more information

ELRA-U-W0377

Greek corpus for Textual Entailment (GTEC)

This corpus contains 600 pairs of textual units manually annotated for textual entailment.
Language(s) : Greek

Click here for
more information

ELRA-U-W0378

Corpus of Serbian Language (CSL)

It was compiled from a sample of 11 million words from the 12th century to nowadays. The corpus has been manually tagged for grammatical category, number of graphemes, number of syllables and phonological structure.
Language(s) : Serbian

Click here for
more information

ELRA-U-W0379

Bulgarian Polish Lithuanian corpus (BG-PL-LT corpus)

This is a trilingual corpus of 3 million words in Bulgarian, Polish and Lithuanian. The BG-PL-LT corpus includes a parallel and a comparable corpus.
Language(s) : Bulgarian - Polish - Lithuanian

Click here for
more information

ELRA-U-W0380

Bulgarian National Corpus

The Bulgarian National Corpus consists of about 320,000,000 words from more than 10,000 texts. This corpus reflects the state of the Bulgarian language (mainly in its written form) from the middle of XX century (1945) until present.
Language(s) : Bulgarian

Click here for
more information

ELRA-U-W0381

INTERREG corpus

This is a parallel corpus which consists of web documents from the general domain. It contains 4,190,000 tokens in Greek, 3,430,000 tokens in Bulgarian and 3,900,000 tokens in English.
Language(s) : Greek - Bulgarian - English

Click here for
more information

ELRA-U-W0382

British English 2006 (BE06)

This corpus contains 1,010,996 words of written British English published between 2003 and 2008. A large part of the texts (82%) were published between 2005 and 2007.
Language(s) : English (United Kingdom)

Click here for
more information

ELRA-U-W0383

FALKO corpus of learner German

This corpus contains texts produced by learners of German (from 49 different native languages). The most represented learners are from Danemark, England, France, Poland and Russia.
Language(s) : German

Click here for
more information

ELRA-U-W0384

Intercorp

This is a multilingual corpus which contains 44 million of words. It consists of texts of fiction semi-automatically aligned between the Czech version and one of the following languages: English, French, German, Russian and Spanish.
Language(s) : Czech >>>> English - Czech >>>> French - Czech >>>> German - Czech >>>> Russian - Czech >>>> Spanish

Click here for
more information

ELRA-U-W0385

Thai Annotated Corpus for Text Summarization and Question Answering

This is a Thai corpus which contains 500 articles of about 120 words per document. It was segmented into 8,000 Elementary Discourse Units (EDUs) and annotated for Text Summarization and Question Answering.
Language(s) : Thai

Click here for
more information

ELRA-U-W0386

Japanese Textbook Corpus

It contains 1,478 samples of text extracted from 127 textbooks (about 1,000,000 characters). It consists of digitized parts of textbooks from elementary schools, junior high schools and high schools (grade 1 to 13) in Japan.
Language(s) : Japanese

Click here for
more information

ELRA-U-W0387

Bio-Event Linguistically Annotated Corpus (BELA)

It contains 677 abstracts annotated for gene regulation events, described both by verbs and nominalised verbs. The corpus consists of MEDLINE abstracts on the subject of E. coli.
Language(s) : English

Click here for
more information

ELRA-U-W0388

Gene Regulation Event Corpus (GREC)

It contains 240 MEDLINE abstracts annotated for events relating to gene regulation and expression.
Language(s) : English

Click here for
more information

ELRA-U-W0389

ERREUS corpus of student learner

This is a corpus of texts written by Basque language learners and native speakers. It is annotated for errors and POS.
Language(s) : Basque

Click here for
more information

ELRA-U-W0390

Sentiment-annotated quotation corpus

This corpus contains a set of English language quotations manually annotated for sentiment (positive, negative, objective/neutral) expressed towards the entities mentioned inside the quotation.
Language(s) : English

Click here for
more information

ELRA-U-W0391

KNB Corpus

This is a Japanese blog corpus. It contains 249 articles and 4,186 sentences. It is fully annotated (part-of-speech, syntax, case, ellipsis, opinion information and review information).
Language(s) : Japanese

Click here for
more information

ELRA-U-W0392

Maltese National Corpus (MNC)

This is a Maltese corpus which contains legal, news, literature and web texts. It is annotated in the XML format.
Language(s) : Maltese

Click here for
more information

ELRA-U-W0393

Homepage corpus

This corpus contains homepages of researchers manually annotated for social information. It is in html format.
Language(s) : Hungarian

Click here for
more information

Displaying 381 to 400 (of 730 products)

Result Pages: [<< Prev] ... 16 17 18 19 20 ... [Next >>]