Universal Catalogue  
  You are here » Universal Catalogue » Spoken Resources » Desktop/microphone
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-SD192
European Parliament Interpreting Corpus (EPIC)
This 177,295 word corpus includes speech and transcripts from the European Parliament, and is made up of nine sub-corpora, three of source speeches (English, Italian and Spanish) and six of target speeches produced by interpreters and covering all the possible combinations of the three languages. Speakers address the audience without expecting any direct reply. The corpus has been orthographically transcribed, includes metadata (a header at the beginning of each transcript and information about the speaker and the speech), then tagged and lemmatised with different taggers.

Size of the nine sub-corpora:
sub-corpus / number of speeches / total word count / % of EPIC
ORG-EN (source) / 81 / 42,705 / 25
INT-EN-IT (interpretation) / 81 / 35,765 / 20
INT-EN-ES (interpretation) / 81 / 38,066 / 21
ORG-IT (source) / 17 / 6,765 / 4
INT-IT-EN (interpretation) / 17 / 6,708 / 4
INT-IT-ES (interpretation) / 17 / 7,052 / 4
ORG-ES (source) / 21 / 14,406 / 8
INT-ES-IT (interpretation) / 21 / 12,833 / 7
INT-ES-EN (interpretation) / 21 / 12,995 / 7
TOTAL / 357 / 177,295 / 100


A search web interface has been developed.
The corpus is now available though the ELRA catalogue (http://catalog.elra.info/index.php) under the reference ELRA-S0323.
Contents Click on the arrow to display content.
 speech corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4