Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-S 0172
Verbmobil Data
The aim of the Verbmobil project was the development of a mobile translation system for the translation of spontaneous speech in face-to-face situations. In this perspective, spontaneous speech data had to be collected and transcribed for the training and testing of Verbmobil systems. The different languages concerned are German, English and Japanese.

3,200 dialogs were collected from 1,658 speakers using a close microphone, a room microphone and a telephone:
- 1,454 dialogs for German, 726 for English and 1,020 for Japanese.
Some dialogs were also annotated using a hierachy of 32 dialog acts.
- 79,562 turns: 41,512 for German, 16,104 for English and 21,946 for Japanese.
- 1,520,000 running words: 670,000 for German, 270,000 for English and 580,000 for Japanese.
- 181,6 hours: 96,1 for German, 37,9 for English and 47,7 for Japanese.

A treebank with 85,000 entries was also developed (German: 35,000; English: 30,000; Japanese: 20,000). It was annotated with part-of-speech tags, phrasal categories, grammatical functions and root labels. The treebank was used for the training of the statistical parser, the chunk parser and the development of semantic construction rules and translation transfer rules.
Production
Project : Verbmobil Project
Applications
Applications possible : Speech recognition#Automatic speech recognition
application Area : Research
Contents Click on the arrow to display content.
 written corpus 
 speech corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4