ELRA - ELRA-U-S0222 : Speech corpus for Amharic

You are here » Universal Catalogue » Spoken Resources » Desktop/microphone

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-S0222

Speech corpus for Amharic

This is a speech corpus of 20 hours. Amharic is one of the official language of Ethiopia and the second most-spoken Semitic language after Arabic.

This corpus was built for the development of an automatic speech recognizer and is divided into several parts:

- the training speech corpus, which contains 10,850 different sentences read by 100 speakers (56 male and 44 female). 80 of them are from the Addis Ababa dialect area while the other 20 speak one of the four other existing dialects (Gojjam, Gonder, Wollo and Menz).

- the development and evaluation set, which contains 38 different sentences read by 24 speakers (20 speakers of the Addis Ababa dialect and 4 speakers of the other four dialects).

- the adaptation set, which contains 53 adaptation sentences that consist of all Amharic CV syllables (for all of the readers).

Contents

Click on the arrow to display content.

speech corpus
Language(s) : Amharic
Duration : 20 hours
Source Channel : Microphone