ELRA - ELRA-U-S 0023 : Universal Annotated Speech Corpus of Standard Lithuanian

You are here » Universal Catalogue » Spoken Resources » Desktop/microphone

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-S 0023

Universal Annotated Speech Corpus of Standard Lithuanian

This corpus contains speech records for a total duration of one hour. Four speakers (2 males, 2 females) were asked to read out a set of around 740 isolated words. This set covers all the most important features of standard Lithuanian speech (275 phonetic units in total). It has been recorded using PCM at the sampling rate of 44100 Hz, with 16 bit (mono format).
This corpus is annotated; it contains phonetic unit records but also phonetic transcriptions and information. The phone-level and the word-level transcriptions data are aligned.

The aim of the corpus is to enable NLP researchers to use and test tools like HTK, MBROLA, etc. for Lithuanian spoken language processing.

Production

Creation date : 2001

Applications

	Applications possible : Speech recognition#Speech synthesis
application Area : Research

Contents

Click on the arrow to display content.

speech corpus
Language(s) : Lithuanian
Duration : 1 hour
Quantisation : 16 bits
Recording Channels : mono
Signal Encoding : Linear PCM
Source Channel : Microphone
Speech Acquisition Mode : Acoustic
Speech Content : Isolated words
Transcription Entries : Phonetic
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Phonetic