Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 181 to 200 (of 730 products) Result Pages: [<< Prev]  ... 6  7  8  9  10 ...  [Next >>] 

ELRA-U-W 0173
CRL-DB-TEXT-97-1 


This resource contains the syntactically analyzed sentences of RWC-DB-TEXT-95-2. It was manually post-edited.
Language(s) : Japanese

Click here for
more information


ELRA-U-W 0174
EDR Japanese Corpus 


The linguistic data of the EDR Japanese Corpus has been analyzed on morphological, syntactic, and semantic levels. It contains approximately 200,000 sentences.
Language(s) : Japanese

Click here for
more information


ELRA-U-W 0175
EDR English Corpus 


The linguistic data of the EDR English has been analyzed on morphological, syntactic, and semantic levels. It contains approximately 120,000 sentences.
Language(s) : English

Click here for
more information


ELRA-U-W 0176
Multilingual Pilot Corpus (MPC)


The MPC is a multilingual corpus that contains extracts from 10 novels in Swedish with translations into English, German, French and Finnish. The total number of words is currently around 250,000 for Swedish.
Language(s) : SwedishEnglish - SwedishFrench - SwedishGerman - SwedishFinnish

Click here for
more information


ELRA-U-W 0177
English-Swedish Parallel Corpus (ESPC)


The ESPC is a bidirectional translation corpus consisting of comparable English and Swedish original texts and their translations into the other language (fiction and non fiction). The total size of the corpus is approximately 2.8 million words.
Texts are aligned at sentence level.
Language(s) : EnglishSwedish - SwedishEnglish

Click here for
more information


ELRA-U-W 0178
English-Spanish Parallel Corpus (P-ACTRES)


This corpus consists of original texts in British and American English and their translation into European Spanish. Excerpts are taken from books of fiction, books of non fiction, newspapers, magazines, miscellanea.
The corpus contains approximately 250,000 words per language.
Language(s) : English >>>> Spanish (Spain)

Click here for
more information


ELRA-U-W 0179
Santiago University Learner of English Corpus (SULEC)


This corpus is composed of written and spoken data collected from Spanish learners of English (all levels: elementary, intermediate and advanced). It contains 500.000 words and the aim is to achieve 1 million words.
Language(s) : English

Click here for
more information


ELRA-U-W 0180
Hungarian National Corpus (HNC)


The HNC currently contains 187.6 million words. It can be divided into five subcorpora by regional language variants (Hungary, Slovakia, Subcarpathia, Transylvania, Vojvodina) or by textual genres (press, literature, science, official, personal).
It has been annotated at morphosyntactic level (stem, part of speech and inflectional information).
Language(s) : Hungarian

Click here for
more information


ELRA-U-W 0181
Potsdam Commentary Corpus (PCC)


The Potsdam Commentary Corpus consists of German newspaper commentaries annotated with different information (and currently, to different degrees): part-of-speech, syntax (TIGER guidelines), rhetorical structure (according to the Rhetorical Structure Theory, with RSTTool), connectives, co-reference (with MMAX), and information structure.
Language(s) : German

Click here for
more information


ELRA-U-W 0182
RST Discourse Treebank 


The RST Discourse Treebank contains 385 Wall Street Journal articles from the Penn Treebank. They have been annotated with discourse structure in the framework of Rhetorical Structure Theory (RST). The corpus also includes humanly-generated extracts and abstracts associated with the original documents.
Language(s) : English (USA)

Click here for
more information


ELRA-U-W 0183
French-Swedish Parallel Corpus (CPSF)


This French-Swedish parallel corpus is a bidirectional translation corpus consisting of comparable French and Swedish original texts and their translations into the other language (10 original texts in each language). The total size of the corpus is approximately 2.8 million words.
Texts are aligned at sentence level.
Language(s) : FrenchSwedish - SwedishFrench

Click here for
more information


ELRA-U-W 0184
Taiwanese Learner Corpus of English (TLCE)


The Taiwanese Learner Corpus of English is a large learner corpus of English collected in Taiwan. It contains 2105 pieces of English writing (around 730,000 words) from Taiwanese college students majoring in English.
The data have been lemmatised and POS tagged.
Language(s) : English (Taiwan)

Click here for
more information


ELRA-U-W 0185
Japanese EFL Learner Corpus (JEFFL)


The JEFFL is an interlanguage corpus of Japanese learners of English. It is concerned with all levels: beginning, intermediate, advanced.
Language(s) : English (Japan)

Click here for
more information


ELRA-U-W 0186
Japanese Learner Corpus of Spanish 


This corpus is composed of essays in Spanish produced by Japanese learners of Spanish (Japanese as L1, English as L2 and Spanish as L3). Six universities and 264 students in Japan participated in this project. The total number of words collected is 83,400. The criteria used to build the corpus follow those of the International Corpus of Learner English (ICLE).
Language(s) : Spanish (Japan)

Click here for
more information


ELRA-U-W 0187
Cambridge and Nottingham Corpus of Discourse in English (CANCODE)


The Cambridge and Nottingham Corpus of Discourse in English (CANCODE) forms a part of the Cambridge International Corpus (CIC).
It contains the transcription of hours of spontaneous speech collected at hundreds of locations across the British Isles between 1995 and 2000, for a total amount of 5 million words.
Language(s) : English (United Kingdom)

Click here for
more information


ELRA-U-W 0188
Cambridge International Corpus (CIC)


The Cambridge International Corpus (CIC) is a very large collection of English texts from newspapers, best-selling novels, non-fiction books, websites, magazines, junk mail, TV and radio programmes, recordings of people's everyday conversations, etc.
Different domains: business, finance, law, academic, general...
The whole collection contains nearly 200 million words.
Language(s) : English (United Kingdom) - English (USA)

Click here for
more information


ELRA-U-W 0189
Cambridge Learner Corpus (CLC)


The Cambridge Learner Corpus (CLC) is a large collection of exam scripts written by learners of English taking Cambridge ESOL exams (25 million words). It currently contains scripts from 85,000 students from 180 different countries and 100 different first languages.
Language(s) : English

Click here for
more information


ELRA-U-W 0190
Lithuanian Annotated Corpus (LAC)


This corpus is a morphologically annotated corpus of Lithuanian (1 million running words). Each wordform is associated with a lemma and a set of morphological features. Disambiguation of the homoforms has been performed manually.
It presents a wide range of textual genres: scientific texts, fiction, parliament debates, administrative texts, etc.
Language(s) : Lithuanian

Click here for
more information


ELRA-U-W 0191
AC/DC corpora for Portuguese 


This collection of Portuguese (European and Brazilian) contains corpora for a total of 371,229,589 words and 15,280,783 sentences. These corpora have been annotated at morphosyntactic level with PALAVRAS (E. Bick, 2000).
Language(s) : Portuguese (Portugal) - Portuguese (Brazil)

Click here for
more information


ELRA-U-W 0192
CHAVE Collection 


CHAVE is a Portuguese collection for Information Retrieval and Questions/Answers created for CLEF in 2004 and updated every year. The text material is composed of the complete texts of the newspapers PÚBLICO (54,947,072) and Folha de São Paulo (35,699,765) of 1994 and 1995.
From April 2007, a version syntactically annotated by PALAVRAS (Bick, 2000) is also available.
Language(s) : Portuguese (Portugal) - Portuguese (Brazil)

Click here for
more information


Displaying 181 to 200 (of 730 products) Result Pages: [<< Prev]  ... 6  7  8  9  10 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4