Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 641 to 660 (of 730 products) Result Pages: [<< Prev]  ... 31  32  33  34  35 ...  [Next >>] 

ELRA-WC319
Why-questions Corpus 


This data collection comprises 395 why-questions, the source documents and one or two answers. Paraphrases were also created for a subset of the questions.
Language(s) : English

Click here for
more information


ELRA-WC320
FAQ (Frequently Asked Questions) Corpus 


This a collection of 2,824,179 Q/A pairs downloaded from the web.
Language(s) : English

Click here for
more information


ELRA-WC321
Cyclone Corpus 


This encyclopedic corpus was built extractingand organizing data from 30 million web pages.
Language(s) : Japanese

Click here for
more information


ELRA-WC322
PropBank Database 


The database is formed by a verb lexicon of about 3,600 verbs and a semantically annotated corpus, the Penn Wall Street Journal Treebank II, where more than 110,000 PropBank instances are annotated.
Language(s) : English

Click here for
more information


ELRA-WC323
Slovene Dependency Treebank 


This treebank consists of 1984 sentences or 30,000 words which were manually annotated.
Language(s) : Slovenian

Click here for
more information


ELRA-WC324
Talbanken05 Swedish Treebank 


The Talbanken05 is a modernized version of Talbanken76, a syntactically annotated Swedish corpus.
Language(s) : Swedish

Click here for
more information


ELRA-WC325
Talbanken76 Swedish Treebank 


This is a Swedish POS tagged and syntactically annotated corpus which contains a written part (professional prose and high school students’ essays) and a spoken part (interviews, and conversations and debates). The whole corpus consists of 300,000 tokens.
Language(s) : Swedish

Click here for
more information


ELRA-WC326
NIL Corpus 


This is an annotated Chinese chat language corpus, an informal language corpus built for informal language processing research. It covers 12,112 pieces of chat language text containing 92,314 words and 12,983 chat terms.
Language(s) : Chinese

Click here for
more information


ELRA-WC327
Japanese Associative Concept Dictionary 


It consists of 33,018 words and 240,093 word pairs made by an association of 10 participants.
Language(s) : Japanese

Click here for
more information


ELRA-WC328
Annotated KNACK-2002 Corpus of Dutch Written Text 


This corpus contains 267 documents from a news magazine in the first ten weeks of 2002 in five domains: economical, political, scientific, cultural and social news. They were annotated with coreferential annotation.
Language(s) : Dutch

Click here for
more information


ELRA-WC329
TagShare Corpus 


This is a Portuguese corpus of one million tokens. 1/3 of the total corpus corresponds to transcribed spoken materials.
Language(s) : Portuguese

Click here for
more information


ELRA-WC330
Nobel Prize Winners in Physics and Economics Corpus 


It consists of Nobel Lectures since the inception of the prize in Physics (1902-2004, 969515 tokens in 157 texts) and in Economics (1969-2004, 727658 tokens in 55 texts).
Language(s) : English

Click here for
more information


ELRA-WC331
Spanish TextCeram Tagged Domain Corpus 


This is a corpus of 12,6 MB of specialised texts from books and works in the field of ceramics, that is 2,8 million words.
Language(s) : Spanish

Click here for
more information


ELRA-WC332
English TextCeram Tagged Domain Corpus 


This is a corpus of 1,16 MB of specialised texts from books and works in the field of ceramics, that is 250,000 words.
Language(s) : English

Click here for
more information


ELRA-WC333
Pilot version of Russian Reference Corpus 


This balanced collection of written modern Russian, which is a a representative collection of various genres, consists of 50 million words.
Language(s) : Russian

Click here for
more information


ELRA-WC334
Corpus of Russian Newspapers 


It contains 78 million words, consisting of several major Russian newspapers from 2001 to 2004.
Language(s) : Russian

Click here for
more information


ELRA-WC335
Russian Internet Corpus 


It contains 160 million words. This a snapshot of modern Russian language as used on the Internet; this is still work in progress.
Language(s) : Russian

Click here for
more information


ELRA-WC336
Corpus of Russian Fiction 


It contains 1.5 million words; its morphosyntactic features have been manually disambiguated.
Language(s) : Russian

Click here for
more information


ELRA-WC337
Computer Corpus of Russian Newspapers Texts of the End of the XX-th Century 


These data include full issues of 13 newspapers issued in 1994-1997. These newspapers are daily and weekly, central and regional, rightist, centrist and leftist. The corpus contains in total 11,401,479 running words, 15.004 different lexemes in 23,109 different texts of various volume.
Language(s) : Russian

Click here for
more information


ELRA-WC338
Brown Corpus of Bulgarian (BCB)


The corpus is structured along the standards of the Brown University Corpus and comprises 1,000,805 words extracted mainly from electronic texts. In the creation of the corpus the requirement was observed for including only original Bulgarian texts. The corpus consists of 500 text units belonging to 15 categories, each unit being approximately 2000 words long.
Language(s) : Bulgarian

Click here for
more information


Displaying 641 to 660 (of 730 products) Result Pages: [<< Prev]  ... 31  32  33  34  35 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4