|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 21 to 40 (of 45 products) |
Result Pages: 2 |
Video and written news broadcasting from Lebanon.
Language(s) : Arabic
|
|
|
|
It contains general written and video news.
Language(s) : English
|
|
|
|
This is a multifaceted corpus for Dutch. It contains material from different sources: newspapers, television subtitles, teleprompter files and broadcast news transcripts with the audio file. It consists of 530 million words and about 800 files of broadcast news audio.
Language(s) : Dutch
|
|
|
|
The Belfast Naturalistic database contains recordings of discussions on emotive subjects and recorded extracts from television programs. Recordings were chosen to be as spontaneous as possible (interactive unscripted discourse), to sample genuine emotional states.
Language(s) : English -
|
|
|
|
The Castaway Database is an English collection of extracts from recordings of a group of people taking part competitively in a range of testing activities on a remote island.
Language(s) : English -
|
|
|
|
The EmoTV database contains extracts from emotional TV audiovisual interviews in French. This is naturalistic data, covering a wide range of positive and negative emotions of various intensities. It consists of 51 videos (of 48 people).
Language(s) : French -
|
|
|
|
Humaine is a labelled multimodal database containing natural speech. It was designed to cover material showing a wide range of emotions in action and interaction, and in different contexts (static, dynamic, outdoor, ...).
Language(s) : English - German - French - Hebrew -
|
|
|
|
The VAM corpus is an emotional speech database. It contains 12 hours of recordings of the German TV talk-show “Vera am Mittag” (Vera at noon).
Language(s) : German -
|
|
|
|
This corpus contains about 42 hours of political debates in French, recorded by the Canal 9 local TV station and broadcast in Switzerland.
Language(s) : French (Switzerland) -
|
|
|
|
This is a Slovenian speech corpus of 178 weather reports captured between October 1999 and February 2000 on the national TV programme (TVSLO1).
Language(s) : Slovene (Slovenia) -
|
|
|
|
This corpus represents Italian spontaneous speech events collected from 1965 onwards to develop studies on the intonation of Italian. It is divided in two sub-corpora illustrating adult speech and early acquisition.
Language(s) : Italian (Italy)
|
|
|
|
This is a 320,000 word corpus of spoken Modern Arabic. It has the following characteristics:
- comparable material of the year 1990.
- newscasts of the radio broadcast.
- data from three countries in which language use seems to differ (Saudi Arabia, Egypt and Algeria).
- transcription and tagging.
Language(s) : Modern Standard Arabic (Egypt) - Modern Standard Arabic
|
|
|
|
The Finnish Broadcast Corpus contains speech recordings from the Finnish Broadcasting Company. The material is divided into four categories: radio monologues, radio dialogues, TV monologues and TV dialogues.
In addition to these primary data, the corpus contains annotations giving information on units in speech (fones, words and utterances, which are aligned with the speech and video signals).
Language(s) : Finnish (Finland) -
|
|
|
|
It gathers 100 hours of recorded speech available in wav format with orthographic transcriptions in txt and phonetic annotations.
Language(s) : Italian (Italy)
|
|
|
|
The MATBN corpus contains 198 one-hour news shows (for a total of approximately 2.3 million Chinese characters). It has been segmented, labeled and transcribed manually.
Language(s) : Mandarin Chinese
|
|
|
|
The Voice of Vietnam is a broadcasting speech corpus. It contains records of 30 broadcasters and speakers reciting stories, news reports, colloquy, for a total of approximately 23,000 utterances and 4,000 distinct syllables. Data have been manually transcribed at syllable level.
Language(s) : Vietnamese
|
|
|
|
It is composed of audio and video recordings of radio and television shows, for a total of 330,000 words covering six different topics: culture, economics, news, opinion, science and sports.
Language(s) : Portuguese (Portugal)
|
|
|
|
ProGmatica is a spontaneous speech corpus of broadcasted television material in European Portuguese (interviews, political debates, informal conversations). It contains 20 hours of natural verbal interactions recorded between 2003 and 2005 and converted to digital format.
It is a multi-speaker corpus where linguistic, paralinguistic and extra linguistic information are labelled and related to each other.
Language(s) : Portuguese
|
|
|
|
This Hungarian database contains 3h30 mns of recordings, transcribed and annotated using the NIST conventions (22,500 words in total). Data consists of complete news broadcasts from public and private TV stations. It was digitized at 16kHz in a wave format (16 bit, 16 kHz PCM with 256 kbps bit rate). The video material was compressed in two formats (Indeo and DivX).
Language(s) : Hungarian
|
|
|
|
This is a speech database in Malay. It contains both read speech and broadcast news.
Language(s) : Malay
|
|
|
|
Displaying 21 to 40 (of 45 products) |
Result Pages: 2 |
|
|