It consists of prompted sentences and answers to questions, recorded in a number of regions of Brazil. The speech data (8080 utterances) from 477 speakers have been recorded at 44.1 kHz, and there are 2572 orthographic transcriptions and 5507 time-aligned phoneme-level transcriptions. The acoustic environment was not controlled, in order to provide realistic background conditions.
This corpus is built up with 42 recordings of narratives from Basque speakers, who retell a silent movie they have just watched to a friend who has not watched it: The Pear Movie and a short collage of scenes from Charlie Chaplin’s Modern Times.
The Finnish Speecon database comprises the recordings of 550 adult Finnish speakers and 50 child Finnish speakers who uttered respectively over 290 items and 210 items (read and spontaneous).
The Russian Speecon database comprises the recordings of 550 adult Russian speakers and 50 child Russian speakers who uttered respectively over 290 items and 210 items (read and spontaneous).
The French Speecon database comprises the recordings of 550 adult French speakers and 50 child French speakers who uttered respectively over 290 items and 210 items (read and spontaneous).
The Hebrew Speecon database comprises the recordings of 550 adult Hebrew speakers and 50 child Hebrew speakers who uttered respectively over 290 items and 210 items (read and spontaneous).
The Spanish Speecon database comprises the recordings of 561 adult Spanish speakers and 55 child Spanish speakers who uttered respectively over 290 items and 210 items (read and spontaneous).
The MARSEC Corpus (MAchine Readable Spoken English Corpus) contains 54,083 words. It consists of 5 hours and a half of continuous speech, by 53 speakers.
It contains 120 tagged texts and approximately 260,000 words of spoken British English. The texts are taken from the British National Corpus (60 files of spoken demographic data) and the Centre for North West Regional Studies (60 files of oral history interview).