|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 141 to 160 (of 730 products) |
Result Pages: 8 |
A pilot corpus of English and Latvian legal texts aligned at sentence level was compiled in 2001. It contains approximately 100,000 words per language.
Language(s) : EnglishLatvian
|
|
|
|
The Corpus of Early Written Latvian Texts contains approximately 1 million running words and is composed of ecclesiastical texts from the 16th to the 18th century (with a structural annotation).
Language(s) : Latvian
|
|
|
|
The Croatian Slovenian parallel corpus contains 3,5 million words and is aligned at sentence level.
Language(s) : CroatianSlovenian
|
|
|
|
The Slovak National Corpus (SNK) is a database of contemporary Slovak language texts. It contains 526,082,640 tokens and covers a broad range of textual genres.
Language(s) : Slovak
|
|
|
|
The first part of the "East meets West" resources is composed of Plato's Republic translated into 17 Western and Eastern European languages. It is fully SGML encoded according to the TEI Guidelines and is annotated with POS tags.
Languages: Ancient Greek, Bulgarian, Czech, English, French, German, Hungarian, Latvian, Lithuanian, Polish, Romanian, Russian, Serbian, Slovene, Slovak, Swedish and Chinese.
The second part is composed of comparable and parallel corpora (including the '1984' novel by G. Orwell) for 6 languages.
Languages: Bulgarian, Czech, Estonian, Hungarian, Romanian, Slovene.
Language(s) : Greek - Bulgarian - Czech - English - French - German - Hungarian - Latvian - Lithuanian - Polish - Romanian - Russian - Serbian - Slovene - Slovak - Swedish - Chinese - Estonian
|
|
|
|
This corpus is composed of translations of Plato's Republic into 11 languages. An alignment at sentence level is provided for all pairs of languages.
Languages: Bulgarian, Czech, German, English, French, Croatian, Lithuanian, Polish, Serbo-Croatian, Slovakian and Slovene.
Language(s) : English - French - German - Croatian - Czech - Slovene - Bulgarian - Lithuanian - Polish - Sardinian - Slovak
|
|
|
|
It is composed of five aligned corpora of legal texts:
- an English-Slovene corpus (about 67 million words),
- a German-Slovene corpus (about 13 million words),
- a French-Slovene corpus (about 25 million words),
- a Spanish-Slovene corpus (about 10 million words),
- an Italian-Slovene corpus (about 11 million words).
It also contains multiligual EU Commission data (98 million words), Slovene-English data from the Trans corpus (700,000 words) and English-Slovene data from the EMEA corpus (7 million words).
Language(s) : Slovene <<< >>> English - Slovene <<< >>> German - Slovene <<< >>> French - Slovene <<< >>> Spanish - Slovene <<< >>> Italian
|
|
|
|
The SVEZ-IJS is a large English-Slovene parallel corpus annotated at sentence level. It contains translated legal texts of the European Union (the Acquis Communautaire), for a total of approximately 5 million words per language.
Both texts are linguistically annotated.
Language(s) : English (United Kingdom)Slovenian (Slovenia)
|
|
|
|
This is a 120 million word collection of texts designed to represent modern Lithuanian. It is a balanced corpus of various genres.
Language(s) : Lithuanian
|
|
|
|
This English-Lithuanian parallel corpus contains 35,505 aligned sentences.
Language(s) : English (United Kingdom) <<< >>> Lithuanian
|
|
|
|
This is a German-Lithuanian parallel corpus of one million words extracted from EU documents.
Language(s) : GermanLithuanian
|
|
|
|
This is a Czech-Lithuanian parallel corpus of fiction texts.
Language(s) : CzechLithuanian
|
|
|
|
The Oslo corpus of tagged Norwegian texts contains more than 20 million words of different genres (fiction, newpapers/magazines and factual prose). It is divided in two subcorpora, the bokmål (18.5 million words) and the nynorsk (3.8 million words).
The texts are grammatically annotated.
Language(s) : Norwegian (Norway)
|
|
|
|
The LOGON corpus is a collection of Norwegian-English parallel texts from the domain of tourism. It contains approximately 255,000 words per language. Texts have been aligned and POS tagged.
Language(s) : NorwegianEnglish
|
|
|
|
The Multieight-04 corpus is a collection of 700 questions in several European languages and their manually retrieved answers.
Languages: German, English, Spanish, French, Italian, Dutch and Portuguese, plus Bulgarian and Finnish exclusively as source languages.
Language(s) : French - English - Finnish - Bulgarian - Portuguese - Spanish - German - Dutch - Italian
|
|
|
|
The DISEQuA corpus is composed of 450 questions formulated into four languages: Dutch, Italian, Spanish and English, with their manually retrieved answers.
Language(s) : Dutch - English - Italian - Spanish
|
|
|
|
The Multisix corpus is a collection of 200 English questions with their manually retrieved answers. These 200 questions have been translated into five languages: Dutch, French, German, Italian and Spanish.
Language(s) : English - French - Spanish - Dutch - German - Italian
|
|
|
|
This resource comprises 1000 questions released for the QA track at TREC-2002 and 2003. They have been translated into Italian. In most cases, the correct answer is also provided.
Language(s) : Italian
|
|
|
|
This resource contains transcriptions of Italian broadcasts. Informations about location, person and organization have been marked with tags.
Language(s) : Italian
|
|
|
|
This resource contains 1893 questions drawn from the TREC QA evaluation exercises and translated into French.
Language(s) : French
|
|
|
|
Displaying 141 to 160 (of 730 products) |
Result Pages: 8 |
|
|