ELRA - ELRA-SD192 : European Parliament Interpreting Corpus (EPIC)

You are here » Universal Catalogue » Spoken Resources » Desktop/microphone

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-SD192

European Parliament Interpreting Corpus (EPIC)

This 177,295 word corpus includes speech and transcripts from the European Parliament, and is made up of nine sub-corpora, three of source speeches (English, Italian and Spanish) and six of target speeches produced by interpreters and covering all the possible combinations of the three languages. Speakers address the audience without expecting any direct reply. The corpus has been orthographically transcribed, includes metadata (a header at the beginning of each transcript and information about the speaker and the speech), then tagged and lemmatised with different taggers.

Size of the nine sub-corpora:
sub-corpus / number of speeches / total word count / % of EPIC
ORG-EN (source) / 81 / 42,705 / 25
INT-EN-IT (interpretation) / 81 / 35,765 / 20
INT-EN-ES (interpretation) / 81 / 38,066 / 21
ORG-IT (source) / 17 / 6,765 / 4
INT-IT-EN (interpretation) / 17 / 6,708 / 4
INT-IT-ES (interpretation) / 17 / 7,052 / 4
ORG-ES (source) / 21 / 14,406 / 8
INT-ES-IT (interpretation) / 21 / 12,833 / 7
INT-ES-EN (interpretation) / 21 / 12,995 / 7
TOTAL / 357 / 177,295 / 100

A search web interface has been developed.
The corpus is now available though the ELRA catalogue (http://catalog.elra.info/index.php) under the reference ELRA-S0323.

Contents

Click on the arrow to display content.

speech corpus
Language(s) : English ; Italian ; Spanish
Source Channel : Microphone
Transcription Entries : Orthographic