|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 181 to 200 (of 730 products) |
Result Pages: 10 |
This resource contains the syntactically analyzed sentences of RWC-DB-TEXT-95-2. It was manually post-edited.
Language(s) : Japanese
|
|
|
|
The linguistic data of the EDR Japanese Corpus has been analyzed on morphological, syntactic, and semantic levels. It contains approximately 200,000 sentences.
Language(s) : Japanese
|
|
|
|
The linguistic data of the EDR English has been analyzed on morphological, syntactic, and semantic levels. It contains approximately 120,000 sentences.
Language(s) : English
|
|
|
|
The MPC is a multilingual corpus that contains extracts from 10 novels in Swedish with translations into English, German, French and Finnish. The total number of words is currently around 250,000 for Swedish.
Language(s) : SwedishEnglish - SwedishFrench - SwedishGerman - SwedishFinnish
|
|
|
|
The ESPC is a bidirectional translation corpus consisting of comparable English and Swedish original texts and their translations into the other language (fiction and non fiction). The total size of the corpus is approximately 2.8 million words.
Texts are aligned at sentence level.
Language(s) : EnglishSwedish - SwedishEnglish
|
|
|
|
This corpus consists of original texts in British and American English and their translation into European Spanish. Excerpts are taken from books of fiction, books of non fiction, newspapers, magazines, miscellanea.
The corpus contains approximately 250,000 words per language.
Language(s) : English >>>> Spanish (Spain)
|
|
|
|
This corpus is composed of written and spoken data collected from Spanish learners of English (all levels: elementary, intermediate and advanced). It contains 500.000 words and the aim is to achieve 1 million words.
Language(s) : English
|
|
|
|
The HNC currently contains 187.6 million words. It can be divided into five subcorpora by regional language variants (Hungary, Slovakia, Subcarpathia, Transylvania, Vojvodina) or by textual genres (press, literature, science, official, personal).
It has been annotated at morphosyntactic level (stem, part of speech and inflectional information).
Language(s) : Hungarian
|
|
|
|
The Potsdam Commentary Corpus consists of German newspaper commentaries annotated with different information (and currently, to different degrees): part-of-speech, syntax (TIGER guidelines), rhetorical structure (according to the Rhetorical Structure Theory, with RSTTool), connectives, co-reference (with MMAX), and information structure.
Language(s) : German
|
|
|
|
The RST Discourse Treebank contains 385 Wall Street Journal articles from the Penn Treebank. They have been annotated with discourse structure in the framework of Rhetorical Structure Theory (RST). The corpus also includes humanly-generated extracts and abstracts associated with the original documents.
Language(s) : English (USA)
|
|
|
|
This French-Swedish parallel corpus is a bidirectional translation corpus consisting of comparable French and Swedish original texts and their translations into the other language (10 original texts in each language). The total size of the corpus is approximately 2.8 million words.
Texts are aligned at sentence level.
Language(s) : FrenchSwedish - SwedishFrench
|
|
|
|
The Taiwanese Learner Corpus of English is a large learner corpus of English collected in Taiwan. It contains 2105 pieces of English writing (around 730,000 words) from Taiwanese college students majoring in English.
The data have been lemmatised and POS tagged.
Language(s) : English (Taiwan)
|
|
|
|
The JEFFL is an interlanguage corpus of Japanese learners of English. It is concerned with all levels: beginning, intermediate, advanced.
Language(s) : English (Japan)
|
|
|
|
This corpus is composed of essays in Spanish produced by Japanese learners of Spanish (Japanese as L1, English as L2 and Spanish as L3). Six universities and 264 students in Japan participated in this project. The total number of words collected is 83,400. The criteria used to build the corpus follow those of the International Corpus of Learner English (ICLE).
Language(s) : Spanish (Japan)
|
|
|
|
The Cambridge and Nottingham Corpus of Discourse in English (CANCODE) forms a part of the Cambridge International Corpus (CIC).
It contains the transcription of hours of spontaneous speech collected at hundreds of locations across the British Isles between 1995 and 2000, for a total amount of 5 million words.
Language(s) : English (United Kingdom)
|
|
|
|
The Cambridge International Corpus (CIC) is a very large collection of English texts from newspapers, best-selling novels, non-fiction books, websites, magazines, junk mail, TV and radio programmes, recordings of people's everyday conversations, etc.
Different domains: business, finance, law, academic, general...
The whole collection contains nearly 200 million words.
Language(s) : English (United Kingdom) - English (USA)
|
|
|
|
The Cambridge Learner Corpus (CLC) is a large collection of exam scripts written by learners of English taking Cambridge ESOL exams (25 million words). It currently contains scripts from 85,000 students from 180 different countries and 100 different first languages.
Language(s) : English
|
|
|
|
This corpus is a morphologically annotated corpus of Lithuanian (1 million running words). Each wordform is associated with a lemma and a set of morphological features. Disambiguation of the homoforms has been performed manually.
It presents a wide range of textual genres: scientific texts, fiction, parliament debates, administrative texts, etc.
Language(s) : Lithuanian
|
|
|
|
This collection of Portuguese (European and Brazilian) contains corpora for a total of 371,229,589 words and 15,280,783 sentences. These corpora have been annotated at morphosyntactic level with PALAVRAS (E. Bick, 2000).
Language(s) : Portuguese (Portugal) - Portuguese (Brazil)
|
|
|
|
CHAVE is a Portuguese collection for Information Retrieval and Questions/Answers created for CLEF in 2004 and updated every year. The text material is composed of the complete texts of the newspapers PÚBLICO (54,947,072) and Folha de São Paulo (35,699,765) of 1994 and 1995.
From April 2007, a version syntactically annotated by PALAVRAS (Bick, 2000) is also available.
Language(s) : Portuguese (Portugal) - Portuguese (Brazil)
|
|
|
|
Displaying 181 to 200 (of 730 products) |
Result Pages: 10 |
|
|