|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 641 to 660 (of 730 products) |
Result Pages: 33 |
This data collection comprises 395 why-questions, the source documents and one or two answers. Paraphrases were also created for a subset of the questions.
Language(s) : English
|
|
|
|
This a collection of 2,824,179 Q/A pairs downloaded from the web.
Language(s) : English
|
|
|
|
This encyclopedic corpus was built extractingand organizing data from 30 million web pages.
Language(s) : Japanese
|
|
|
|
The database is formed by a verb lexicon of about 3,600 verbs and a semantically annotated corpus, the Penn Wall Street Journal Treebank II, where more than 110,000 PropBank instances are annotated.
Language(s) : English
|
|
|
|
This treebank consists of 1984 sentences or 30,000 words which were manually annotated.
Language(s) : Slovenian
|
|
|
|
The Talbanken05 is a modernized version of Talbanken76, a syntactically annotated Swedish corpus.
Language(s) : Swedish
|
|
|
|
This is a Swedish POS tagged and syntactically annotated corpus which contains a written part (professional prose and high school students’ essays) and a spoken part (interviews, and conversations and debates). The whole corpus consists of 300,000 tokens.
Language(s) : Swedish
|
|
|
|
This is an annotated Chinese chat language corpus, an informal language corpus built for informal language processing research. It covers 12,112 pieces of chat language text containing 92,314 words and 12,983 chat terms.
Language(s) : Chinese
|
|
|
|
It consists of 33,018 words and 240,093 word pairs made by an association of 10 participants.
Language(s) : Japanese
|
|
|
|
This corpus contains 267 documents from a news magazine in the first ten weeks of 2002 in five domains: economical, political, scientific, cultural and social news. They were annotated with coreferential annotation.
Language(s) : Dutch
|
|
|
|
This is a Portuguese corpus of one million tokens. 1/3 of the total corpus corresponds to transcribed spoken materials.
Language(s) : Portuguese
|
|
|
|
It consists of Nobel Lectures since the inception of the prize in Physics (1902-2004, 969515 tokens in 157 texts) and in Economics (1969-2004, 727658 tokens in 55 texts).
Language(s) : English
|
|
|
|
This is a corpus of 12,6 MB of specialised texts from books and works in the field of ceramics, that is 2,8 million words.
Language(s) : Spanish
|
|
|
|
This is a corpus of 1,16 MB of specialised texts from books and works in the field of ceramics, that is 250,000 words.
Language(s) : English
|
|
|
|
This balanced collection of written modern Russian, which is a a representative collection of various genres, consists of 50 million words.
Language(s) : Russian
|
|
|
|
It contains 78 million words, consisting of several major Russian newspapers from 2001 to 2004.
Language(s) : Russian
|
|
|
|
It contains 160 million words. This a snapshot of modern Russian language as used on the Internet; this is still work in progress.
Language(s) : Russian
|
|
|
|
It contains 1.5 million words; its morphosyntactic features have been manually disambiguated.
Language(s) : Russian
|
|
|
|
These data include full issues of 13 newspapers issued in 1994-1997. These newspapers are daily and weekly, central and regional, rightist, centrist and leftist. The corpus contains in total 11,401,479 running words, 15.004 different lexemes in 23,109 different texts of various volume.
Language(s) : Russian
|
|
|
|
The corpus is structured along the standards of the Brown University Corpus and comprises 1,000,805 words extracted mainly from electronic texts. In the creation of the corpus the requirement was observed for including only original Bulgarian texts. The corpus consists of 500 text units belonging to 15 categories, each unit being approximately 2000 words long.
Language(s) : Bulgarian
|
|
|
|
Displaying 641 to 660 (of 730 products) |
Result Pages: 33 |
|
|