Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-W 0039
The Europarl Corpus
The Europarl Corpus is a multilingual collection of texts extracted from the proceedings of the European Parliament. It concerns 11 languages.

Danish: 47,305,502 words
German: 44,688,020 words
Greek: 26,306,875 words (in 2007)
English: 50,978,295 words
Spanish: 52,503,808 words
Finnish: 34,106,317 words
French: 55,088,177 words
Italian: 50,161,729 words
Dutch: 50,926,645 words
Portuguese: 51,294,994 words
Swedish: 43,291,692 words

Mark-up concerns document, speaker and paragraph information.

This corpus was compiled for machine translation. It is available with a sentence aligner.

See also the parallel texts that have already been constructed (from all languages towards English). The data can also be asked for other language pairs.
Identification
Period of coverage :
Version : v5, 2010
Version history : v3, 2007
Production
Creation date : 2007
Applications
application Area : Research
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4