You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0039
The Europarl Corpus
The Europarl Corpus is a multilingual collection of texts extracted from the proceedings of the European Parliament. It concerns 11 languages.
Danish: 47,305,502 words
German: 44,688,020 words
Greek: 26,306,875 words (in 2007)
English: 50,978,295 words
Spanish: 52,503,808 words
Finnish: 34,106,317 words
French: 55,088,177 words
Italian: 50,161,729 words
Dutch: 50,926,645 words
Portuguese: 51,294,994 words
Swedish: 43,291,692 words
Mark-up concerns document, speaker and paragraph information.
This corpus was compiled for machine translation. It is available with a sentence aligner.
See also the parallel texts that have already been constructed (from all languages towards English). The data can also be asked for other language pairs.
Identification
Period of coverage :
Version :
v5, 2010
Version history :
v3, 2007
Production
Creation date :
2007
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Multilingual
Language(s) :
Danish (Denmark) ; German (Germany) ; Greek (Greece) ; English (United Kingdom) ; Spanish (Spain) ; Finnish (Finland) ; French (France) ; Italian (Italy) ; Dutch (Netherlands) ; Swedish (Sweden) ; Portuguese (Portugal)
Character set :
utf8
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4