Language Resources |
Search Catalogue |
Send us information |
Languages |
Displaying 81 to 100 (of 423 products) |
Result Pages: 5 |
This corpus is extracted from the autobiographical audio book “Mein Leben” by Marcel Reich-Ranicki, consisting of 2 CDs with extracts of the corresponding book read aloud by the author.
Language(s) : German
The database contains parliamentary statements read by one male speaker. It consists of a selection of 2150 sentences annotated and manually verified, including 100 rare phonemes in words.
It is distributed through the ELRA catalogue http://catalog.elra.info under the reference ELRA-S0339.
Language(s) : Polish
The Mandarin Chinese Speecon database comprises the recordings of 600 Mandarin Chinese speakers.
Language(s) : Mandarin Chinese
This corpus contains 190 hours of academic speech, recorded and transcribed (1,7 million words). It covers several domains: humanities and arts, social sciences and education, biological and health sciences, physical sciences and engineering and others.
Language(s) : English
The corpus is a part of the British National Corpus, and consists of 472,000 words of transcribed text. It is a large English Corpus focusing on spontaneous conversations of teenagers. It was collected in 1993 and consists of the spoken language of 13 to 17-year-old teenagers from different boroughs of London and with different social backgrounds. The complete corpus, half a million words, has been orthographically transcribed and word-class tagged.
Language(s) : English
This corpus is composed of about 300,000 words of unmonitored casual speech. It gathers interviews of 40 speakers from Columbus (USA). The recordings have been orthographically transcribed and phonetically labeled.
Language(s) : English (USA)
This is a danish phonetically annotated spontaneous speech corpus. It consists of monologues, dialogues and word lists, containing about 70,000 words corresponding to 10 hours of speech, recorded by 27 speakers.
Language(s) : Danish
This is a multi-modal corpus consisting of audio, videos, driving information and transcripts. Dialogues between a driver and a navigator were recorded in car with around 800 subjects. It contains about 1.03 million morphemes. 35,000 utterance units of the corpus have been manually tagged.
Language(s) : Japanese
This is a 6 hour non-native speech corpus. 15 non-native French speakers were recorded: 7 native Chinese speakers from China and 8 native Vietnamese speakers from Vietnam. The corpus consists of two parts: dialog phrases and read articles in tourism domain.
Language(s) : French
116 native Polish speakers read English prompts with rich phonetic contexts. The corpus contains sentences, 6,032 files, that is 3.5 GB and 14h 37min 37sec of running speech.
Language(s) : English (Poland)
This corpus represents 8 British English diphtongs in 12 different contexts. 30 speakers were recorded reading 61 sentences (each read 3 times by each subject).
Language(s) : English (United Kingdom)
This is a bilingual parallel corpus of two phonetically similar languages. Valencian is a Catalan language dialect spoken in the Comunitat Valenciana. 20 speakers recorded 120 sentences, 60 per language, that is to say about one hour of speech for each language.
Language(s) : Catalan, Valencian - Spanish (Spain)
This speech corpus has been recorded on a motorbike by two speakers using helmet and throat microphones and a Bluetooth transceiver. It consists of 38 sessions of 12 query blocks, with 6 queries per query block. In total the corpus contains 2835 queries with about 31,900 running words.
Language(s) : German
The Polish Speecon database comprises the recordings of 550 adult Polish speakers and 50 child Polish speakers who uttered respectively over 290 items and 210 items (read and spontaneous).
Language(s) : Polish
This corpus contains speech data and their transcriptions. 26 speakers (12 female, 14 male) recorded sessions with both neutral and simulated noisy scenarios. The sessions consist of 205 utterances per speaker and scenario, that is to say 10-12 minutes of continuous speech. Each session contains 30 phonetically rich sentences and 470 repeated and isolated digits.
Language(s) : Czech
This corpus aims to be a tool for linguistic research on aphasia. It will include speech representing different types of aphasia (Broca, Wernicke, global, transcortical, anomic, etc.) and various communication settings. For a pilot study for the CoDAS Corpus, speech material from six aphasic patients has been collected. Their average age was 54 and the time post onset was between three and four years. The patient had to answer questions on five standard topics. Each patient produced at least 300 words and three of the five topics at least are discussed. There were also a repetition task, a writing task, a naming task and a comprehension task. The data has been orthographically transcribed, phonetically transcribed and Part-of-Speech tagged.
Language(s) : Dutch (The Netherlands)
This is a database of contemporary standard Dutch spoken by adults in the Netherlands and Flanders. It consists of about 10 million words, that is to say 1,000 hours of speech data which have been recorded in different communicational settings.
Language(s) : Dutch
It consists of 3 dialogues from the CLIPS project and the Ipar project, and 13 annotation levels (orthographic, morphosyntactic, syntactic, lexical, rhythmic levels, etc.). Each annotation level selects its own base unit, depending on linguistic factors.
Language(s) : Italian (Italy)
This is a corpus of task-oriented spoken dialogs. The aim of the dialogs is to create, discuss and evaluate various plans involving freight shipments by train. 34 speakers recorded 98 dialogs, 5,900 turns, involving 20 different tasks. In total, it contains 6 and a half hours of recorded speech with 55,000 transcribed words.
Language(s) : English
This is a corpus of task-oriented spoken dialogs containing 128 dialogs for giving driving directions.
Language(s) : English
Displaying 81 to 100 (of 423 products) |
Result Pages: 5 |