speech

Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages

his work introduces Zambezi Voice, an open-source multilingual speech resource for Zambian languages. It contains two collections of datasets: unlabelled audio recordings of radio news and talk shows programs (160 hours) and labelled data (over 80 …

BIG-C: a Multimodal Multi-Purpose Dataset for Bemba

We present BIG-C (Bemba Image Grounded Conversations), a large multimodal dataset for Bemba. While Bemba is the most populous language of Zambia, it exhibits a dearth of resources which render the development of language technologies or language …

SD-QA: Spoken Dialectal Question Answering for the Real World

Question answering (QA) systems are now available through numerous commercial applications for a wide variety of domains, serving millions of users that interact with them via speech interfaces. However, current benchmarks in QA research do not …

Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties

Models pre-trained on multiple languages have shown significant promise for improving speech recognition, particularly for low-resource languages. In this work, we focus on phoneme recognition using Allosaurus, a method for multilingual recognition …

BembaSpeech: A Speech Recognition Corpus for the Bemba Language

We present a preprocessed, ready-to-use automatic speech recognition corpus, BembaSpeech, consisting over 24 hours of read speech in the Bemba language, a written but low-resourced language spoken by over 30% of the population in Zambia. To assess …

Most languages of the world are “oral”: they are not traditionally written and even if an alphabet exists, the community doesn’t usually use it. Hence, building NLP systems that can directly operate on speech input is paramount.

AlloVera: A Multilingual Allophone Database

We introduce a new resource, AlloVera, which provides mappings from 218 allophones to phonemes for 14 languages. Phonemes are contrastive phonological units, and allophones are their various concrete realizations, which are predictable from …

Universal Phone Recognition with a Multilingual Allophone System

Tied Multitask Learning for Neural Speech Translation

We explore multitask models for neural translation of speech, augmenting them in order to reflect two intuitive notions. First, we introduce a model where the second task decoder receives information from the decoder of the first task, since …

Leveraging Translations for Speech Transcription in Low-resource Settings

Recently proposed data collection frameworks for endangered language documentation aim not only to collect speech in the language of interest, but also to collect translations into a highresource language that will render the collected resource …