Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 141 to 160 (of 730 products) Result Pages: [<< Prev]  ... 6  7  8  9  10 ...  [Next >>] 

ELRA-U-W 0133
English-Latvian Legal Parallel Corpus 


A pilot corpus of English and Latvian legal texts aligned at sentence level was compiled in 2001. It contains approximately 100,000 words per language.
Language(s) : EnglishLatvian

Click here for
more information


ELRA-U-W 0134
Corpus of Early Written Latvian Texts (SENIE)


The Corpus of Early Written Latvian Texts contains approximately 1 million running words and is composed of ecclesiastical texts from the 16th to the 18th century (with a structural annotation).
Language(s) : Latvian

Click here for
more information


ELRA-U-W 0135
Croatian-Slovenian Parallel Corpus 


The Croatian Slovenian parallel corpus contains 3,5 million words and is aligned at sentence level.
Language(s) : CroatianSlovenian

Click here for
more information


ELRA-U-W 0136
Slovak National Corpus 


The Slovak National Corpus (SNK) is a database of contemporary Slovak language texts. It contains 526,082,640 tokens and covers a broad range of textual genres.
Language(s) : Slovak

Click here for
more information


ELRA-U-W 0137
"East meets West" Corpus 


The first part of the "East meets West" resources is composed of Plato's Republic translated into 17 Western and Eastern European languages. It is fully SGML encoded according to the TEI Guidelines and is annotated with POS tags.

Languages: Ancient Greek, Bulgarian, Czech, English, French, German, Hungarian, Latvian, Lithuanian, Polish, Romanian, Russian, Serbian, Slovene, Slovak, Swedish and Chinese.

The second part is composed of comparable and parallel corpora (including the '1984' novel by G. Orwell) for 6 languages.

Languages: Bulgarian, Czech, Estonian, Hungarian, Romanian, Slovene.
Language(s) : Greek - Bulgarian - Czech - English - French - German - Hungarian - Latvian - Lithuanian - Polish - Romanian - Russian - Serbian - Slovene - Slovak - Swedish - Chinese - Estonian

Click here for
more information


ELRA-U-W 0138
Plato's Republic Parallel Corpus 


This corpus is composed of translations of Plato's Republic into 11 languages. An alignment at sentence level is provided for all pairs of languages.

Languages: Bulgarian, Czech, German, English, French, Croatian, Lithuanian, Polish, Serbo-Croatian, Slovakian and Slovene.
Language(s) : English - French - German - Croatian - Czech - Slovene - Bulgarian - Lithuanian - Polish - Sardinian - Slovak

Click here for
more information


ELRA-U-W 0139
Evrokorpus Parallel Corpus 


It is composed of five aligned corpora of legal texts:
- an English-Slovene corpus (about 67 million words),
- a German-Slovene corpus (about 13 million words),
- a French-Slovene corpus (about 25 million words),
- a Spanish-Slovene corpus (about 10 million words),
- an Italian-Slovene corpus (about 11 million words).

It also contains multiligual EU Commission data (98 million words), Slovene-English data from the Trans corpus (700,000 words) and English-Slovene data from the EMEA corpus (7 million words).
Language(s) : Slovene <<< >>> English - Slovene <<< >>> German - Slovene <<< >>> French - Slovene <<< >>> Spanish - Slovene <<< >>> Italian

Click here for
more information


ELRA-U-W 0140
The SVEZ-IJS English-Slovene Acquis Corpus (SVEZ-IJS)


The SVEZ-IJS is a large English-Slovene parallel corpus annotated at sentence level. It contains translated legal texts of the European Union (the Acquis Communautaire), for a total of approximately 5 million words per language.
Both texts are linguistically annotated.
Language(s) : English (United Kingdom)Slovenian (Slovenia)

Click here for
more information


ELRA-U-W 0141
Corpus of the Contemporary Lithuanian Language (CCLL)


This is a 120 million word collection of texts designed to represent modern Lithuanian. It is a balanced corpus of various genres.
Language(s) : Lithuanian

Click here for
more information


ELRA-U-W 0142
English-Lithuanian Parallel Corpus 


This English-Lithuanian parallel corpus contains 35,505 aligned sentences.
Language(s) : English (United Kingdom) <<< >>> Lithuanian

Click here for
more information


ELRA-U-W 0143
German-Lithuanian Parallel Corpus 


This is a German-Lithuanian parallel corpus of one million words extracted from EU documents.
Language(s) : GermanLithuanian

Click here for
more information


ELRA-U-W 0144
Czech-Lithuanian Parallel Corpus 


This is a Czech-Lithuanian parallel corpus of fiction texts.
Language(s) : CzechLithuanian

Click here for
more information


ELRA-U-W 0145
The Oslo Corpus of Tagged Norwegian Texts 


The Oslo corpus of tagged Norwegian texts contains more than 20 million words of different genres (fiction, newpapers/magazines and factual prose). It is divided in two subcorpora, the bokmål (18.5 million words) and the nynorsk (3.8 million words).
The texts are grammatically annotated.
Language(s) : Norwegian (Norway)

Click here for
more information


ELRA-U-W 0146
The LOGON parallel tourist corpus of Norwegian-English texts 


The LOGON corpus is a collection of Norwegian-English parallel texts from the domain of tourism. It contains approximately 255,000 words per language. Texts have been aligned and POS tagged.
Language(s) : NorwegianEnglish

Click here for
more information


ELRA-U-W 0147
Multieight-04 Corpus 


The Multieight-04 corpus is a collection of 700 questions in several European languages and their manually retrieved answers.

Languages: German, English, Spanish, French, Italian, Dutch and Portuguese, plus Bulgarian and Finnish exclusively as source languages.
Language(s) : French - English - Finnish - Bulgarian - Portuguese - Spanish - German - Dutch - Italian

Click here for
more information


ELRA-U-W 0148
DISEQuA Corpus 


The DISEQuA corpus is composed of 450 questions formulated into four languages: Dutch, Italian, Spanish and English, with their manually retrieved answers.
Language(s) : Dutch - English - Italian - Spanish

Click here for
more information


ELRA-U-W 0149
Multisix Corpus 


The Multisix corpus is a collection of 200 English questions with their manually retrieved answers. These 200 questions have been translated into five languages: Dutch, French, German, Italian and Spanish.
Language(s) : English - French - Spanish - Dutch - German - Italian

Click here for
more information


ELRA-U-W 0150
Italian Translation of the TREC Questions 


This resource comprises 1000 questions released for the QA track at TREC-2002 and 2003. They have been translated into Italian. In most cases, the correct answer is also provided.
Language(s) : Italian

Click here for
more information


ELRA-U-W 0151
Test Set for Italian Named-Entities Recognition 


This resource contains transcriptions of Italian broadcasts. Informations about location, person and organization have been marked with tags.
Language(s) : Italian

Click here for
more information


ELRA-U-W 0152
1893 questions of TREC in French 


This resource contains 1893 questions drawn from the TREC QA evaluation exercises and translated into French.
Language(s) : French

Click here for
more information


Displaying 141 to 160 (of 730 products) Result Pages: [<< Prev]  ... 6  7  8  9  10 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4