You are here
»
Universal Catalogue
»
Spoken Resources
»
Desktop/microphone
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-SD192
European Parliament Interpreting Corpus (EPIC)
This 177,295 word corpus includes speech and transcripts from the European Parliament, and is made up of nine sub-corpora, three of source speeches (English, Italian and Spanish) and six of target speeches produced by interpreters and covering all the possible combinations of the three languages. Speakers address the audience without expecting any direct reply. The corpus has been orthographically transcribed, includes metadata (a header at the beginning of each transcript and information about the speaker and the speech), then tagged and lemmatised with different taggers.
Size of the nine sub-corpora:
sub-corpus / number of speeches / total word count / % of EPIC
ORG-EN (source) / 81 / 42,705 / 25
INT-EN-IT (interpretation) / 81 / 35,765 / 20
INT-EN-ES (interpretation) / 81 / 38,066 / 21
ORG-IT (source) / 17 / 6,765 / 4
INT-IT-EN (interpretation) / 17 / 6,708 / 4
INT-IT-ES (interpretation) / 17 / 7,052 / 4
ORG-ES (source) / 21 / 14,406 / 8
INT-ES-IT (interpretation) / 21 / 12,833 / 7
INT-ES-EN (interpretation) / 21 / 12,995 / 7
TOTAL / 357 / 177,295 / 100
A search web interface has been developed.
The corpus is now available though the ELRA catalogue (
http://catalog.elra.info/index.php
) under the reference ELRA-S0323.
Contents
Click on the arrow to display content.
speech corpus
Language(s) :
English ; Italian ; Spanish
Source Channel :
Microphone
Transcription Entries : Orthographic
Thursday 21 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4