|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 81 to 100 (of 148 products) |
Result Pages: 5 |
This database is composed of various types of recordings. 600 recording sheets containing 80 sentences each were prepared with English long sentences, English short sentences, English words and mixed Chinese-English sentences. They were read by English Department people and non-English Department people, male and female, and recorded using hand-held microphone OR wire/wireless telephone (PSTN/GSM).
Language(s) : English (Taiwan)
|
|
|
|
This CASIA corpus contains 257 spontaneous telephone dialogues on given topics. 514 speakers have been recorded for approximately 120 hours.
Language(s) : Chinese
|
|
|
|
This is a speech database containing 84,000 read sentences. Each speaker was asked to read a set of 210 sentences, and nearly 500 speakers of three different age groups, genre (50% male/female) and four major Western Indonesian accents (Javanese, Sundanese, Batak, Standard Indonesian) were recorded by telephone or microphone.
Language(s) : Indonesian
|
|
|
|
Each of the 1,000 speakers uttered 60 words and sentences.
Data were collected through the mobile network (digital telephone lines).
Language(s) : Korean
|
|
|
|
More than 2,000 speakers from different demographic profiles, environments and dialects were asked to read isolated words, digit sequences, phonetically rich words and sentences, etc. Data were collected through the mobile network (GSM/CDMA networks).
Language(s) : Hindi
|
|
|
|
More than 1,800 speakers from different demographic profiles, environments and dialects were asked to read isolated words, digit sequences, phonetically rich words and sentences, application words and phrases, etc.
Data were collected through the mobile network.
Language(s) : Hindi
|
|
|
|
More than 1,800 speakers from different demographic profiles, environments were asked to read isolated words, digit sequences, phonetically rich words and sentences, application words and phrases, etc.
Data were collected through the telephone network.
Language(s) : English
|
|
|
|
This multilingual corpus contains speech for the three most frequently used languages in Taiwan: Mandarin, Min-Nan (Taiwanese) and Hakka. The project plans to record more than 1,800 speakers and hundreds of hours.
Language(s) : Mandarin (Taiwan) - Chinese (Taiwan)
|
|
|
|
This is a database containing conversational telephone speech in Thai.
Language(s) : Thai
|
|
|
|
This is a telephone speech database for Vietnamese (from the north and the south). It is labeled at syllable level and also manually labeled at phonetic level.
Language(s) : Vietnamese
|
|
|
|
This corpus contains recordings of one female speaker, for a total of 567 utterances of an average length of 15 syllables. It is labeled at syllable level (segment boundaries).
Language(s) : Vietnamese
|
|
|
|
The corpus is composed of 1206 ten-minute natural Mandarin conversations between strangers or friends, for a total amount of 200 hours. Each conversation focuses on a single topic and the total number of topics is 40.
Language(s) : Mandarin Chinese
|
|
|
|
The MTBA is a Hungarian database of telephone speech, in which the major dialectal variants are represented. 500 speakers were recorded from all over the country; they were asked to read a given text material into the phone. This material was composed of application words, numbers, dates, spelling and names, phonetically rich sentences and words.
Language(s) : Hungarian
|
|
|
|
Tesztel is a Hungarian database of telephone speech recorded in noisy environments. It contains voices of 100 speakers, recorded through mobile telephone in noisy environments. The material was composed of continuously told sentences, command words, spelled forenames, numbers, dates, different currency types, city names, questions with yes/no answer, phonetically rich words. The database contains mostly spontaneous speech.
The measured signal-to-noise ratio varies from 5dB to 25dB.
Language(s) : Hungarian
|
|
|
|
The Jupiter weather corpus is a collection of spontaneous speech data. The Jupiter conversational system in the weather information domain was used to collect speech data over the phone.
In 2000, more than 400,000 utterances (58,000 calls) had been recorded. Some of them have been orthographically transcribed.
Language(s) : English (USA)
|
|
|
|
This Saudi accented Arabic telephone speech database was collected during 2002 and 2003.
- Number of speakers: 1033 native speakers (51% males, 49% females).
- Telephone network: 70% mobile, 30% fixed-line network.
- Recording environment for mobile network: quiet (35%), noisy (35%), moving vehicle (30%).
- Recording environment for fixed-line network: quiet (75%), noisy (25%).
- Material: each speaker read 59 prompts with numbers, phonetically rich words, sentences and pronunciation of the Arabic and English alphabets.
- Duration: 96 hours
- Average duration per speaker: 5.60 minutes.
Language(s) : Arabic
|
|
|
|
The Mokusei system, a conversational system in the weather information domain, was used to collect spontaneous speech data over the telephone network.
In 2000, 713 calls had been collected from native speakers, resulting in 10,480 utterances.
Language(s) : Japanese
|
|
|
|
The Muxing system, a conversational system in the weather information domain, was used to collect spontaneous speech data over the telephone network.
In 2000, 1,200 sentences from 235 Chinese speakers (in the Greater Boston area) had been collected.
Language(s) : Chinese
|
|
|
|
The Mercury system is an air travel reservation system that has been used to collect spontaneous speech data over the telephone network.
In 2002, more than 25,000 utterances had been collected.
Language(s) : English
|
|
|
|
The LUNA corpus is a multi-domain multilingual dialogue corpus. It is expected to contain 1,000 human-human and 8,100 human-machine dialogues in French, Italian and Polish. They are collected in the following application domains: travel information and reservation, public transportation information, IT help desk, telecom customer care and financial information and transaction.
Processing: segmentation (into dialogue turns), standard orthographic transcription, multi-level annotation.
Levels of annotation: syntactic, semantic and discourse information.
Language(s) : French - Italian - Polish
|
|
|
|
Displaying 81 to 100 (of 148 products) |
Result Pages: 5 |
|
|