Universal Catalogue

You are here » Universal Catalogue » Spoken Resources » Broadcast Resources

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Broadcast Resources

Displaying 21 to 40 (of 45 products)

Result Pages: [<< Prev] 1 2 3 [Next >>]

ELRA-SDifft8

LBC Lebanese Broadcasting

Video and written news broadcasting from Lebanon.
Language(s) : Arabic

Click here for
more information

ELRA-SDifft9

Press Association

It contains general written and video news.
Language(s) : English

Click here for
more information

ELRA-U-MM0005

Twente News Corpus (TwNC)

This is a multifaceted corpus for Dutch. It contains material from different sources: newspapers, television subtitles, teleprompter files and broadcast news transcripts with the audio file. It consists of 530 million words and about 800 files of broadcast news audio.
Language(s) : Dutch

Click here for
more information

ELRA-U-MM0008

Belfast Naturalistic Database

The Belfast Naturalistic database contains recordings of discussions on emotive subjects and recorded extracts from television programs. Recordings were chosen to be as spontaneous as possible (interactive unscripted discourse), to sample genuine emotional states.
Language(s) : English -

Click here for
more information

ELRA-U-MM0009

Castaway Reality Television Database

The Castaway Database is an English collection of extracts from recordings of a group of people taking part competitively in a range of testing activities on a remote island.
Language(s) : English -

Click here for
more information

ELRA-U-MM0010

EmoTV Database

The EmoTV database contains extracts from emotional TV audiovisual interviews in French. This is naturalistic data, covering a wide range of positive and negative emotions of various intensities. It consists of 51 videos (of 48 people).
Language(s) : French -

Click here for
more information

ELRA-U-MM0020

Humaine Database

Humaine is a labelled multimodal database containing natural speech. It was designed to cover material showing a wide range of emotions in action and interaction, and in different contexts (static, dynamic, outdoor, ...).
Language(s) : English - German - French - Hebrew -

Click here for
more information

ELRA-U-MM0025

Vera am Mittag German Audio-Visual Emotional Speech Database (VAM corpus)

The VAM corpus is an emotional speech database. It contains 12 hours of recordings of the German TV talk-show “Vera am Mittag” (Vera at noon).
Language(s) : German -

Click here for
more information

ELRA-U-MM0027

Canal 9 political debate corpus

This corpus contains about 42 hours of political debates in French, recorded by the Canal 9 local TV station and broadcast in Switzerland.
Language(s) : French (Switzerland) -

Click here for
more information

ELRA-U-MM0049

VNTV speech database

This is a Slovenian speech corpus of 178 weather reports captured between October 1999 and February 2000 on the national TV programme (TVSLO1).
Language(s) : Slovene (Slovenia) -

Click here for
more information

ELRA-U-S 0001

LABLITA Corpus

This corpus represents Italian spontaneous speech events collected from 1965 onwards to develop studies on the intonation of Italian. It is divided in two sub-corpora illustrating adult speech and early acquisition.
Language(s) : Italian (Italy)

Click here for
more information

ELRA-U-S 0007

Spoken Arabic Corpora

This is a 320,000 word corpus of spoken Modern Arabic. It has the following characteristics:
- comparable material of the year 1990.
- newscasts of the radio broadcast.
- data from three countries in which language use seems to differ (Saudi Arabia, Egypt and Algeria).
- transcription and tagging.
Language(s) : Modern Standard Arabic (Egypt) - Modern Standard Arabic

Click here for
more information

ELRA-U-S 0012

Finnish Broadcast Corpus (FBC)

The Finnish Broadcast Corpus contains speech recordings from the Finnish Broadcasting Company. The material is divided into four categories: radio monologues, radio dialogues, TV monologues and TV dialogues.
In addition to these primary data, the corpus contains annotations giving information on units in speech (fones, words and utterances, which are aligned with the speech and video signals).
Language(s) : Finnish (Finland) -

Click here for
more information

ELRA-U-S 0014

CLIPS Corpus of Spoken Italian (CLIPS)

It gathers 100 hours of recorded speech available in wav format with orthographic transcriptions in txt and phonetic annotations.
Language(s) : Italian (Italy)

Click here for
more information

ELRA-U-S 0035

Mandarin Chinese Broadcast News Corpus (MATBN)

The MATBN corpus contains 198 one-hour news shows (for a total of approximately 2.3 million Chinese characters). It has been segmented, labeled and transcribed manually.
Language(s) : Mandarin Chinese

Click here for
more information

ELRA-U-S 0089

Voice of Vietnam Corpus (VOV)

The Voice of Vietnam is a broadcasting speech corpus. It contains records of 30 broadcasters and speakers reciting stories, news reports, colloquy, for a total of approximately 23,000 utterances and 4,000 distinct syllables. Data have been manually transcribed at syllable level.
Language(s) : Vietnamese

Click here for
more information

ELRA-U-S 0126

REDIP corpus of Portuguese

It is composed of audio and video recordings of radio and television shows, for a total of 330,000 words covering six different topics: culture, economics, news, opinion, science and sports.
Language(s) : Portuguese (Portugal)

Click here for
more information

ELRA-U-S 0128

ProGmatica

ProGmatica is a spontaneous speech corpus of broadcasted television material in European Portuguese (interviews, political debates, informal conversations). It contains 20 hours of natural verbal interactions recorded between 2003 and 2005 and converted to digital format.
It is a multi-speaker corpus where linguistic, paralinguistic and extra linguistic information are labelled and related to each other.
Language(s) : Portuguese

Click here for
more information

ELRA-U-S 0140

Hungarian Broadcast News Database (Hungarian BN Databas)

This Hungarian database contains 3h30 mns of recordings, transcribed and annotated using the NIST conventions (22,500 words in total). Data consists of complete news broadcasts from public and private TV stations. It was digitized at 16kHz in a wave format (16 bit, 16 kHz PCM with 256 kbps bit rate). The video material was compressed in two formats (Indeo and DivX).
Language(s) : Hungarian

Click here for
more information

ELRA-U-S0232

Malay Speech Corpus

This is a speech database in Malay. It contains both read speech and broadcast news.
Language(s) : Malay

Click here for
more information

Displaying 21 to 40 (of 45 products)

Result Pages: [<< Prev] 1 2 3 [Next >>]