Towards more equitable question answering systems: How much more data do you need?

Arnab Debnath, Navid Rajabi, Fardina Fathmiul Alam, Antonios Anastasopoulos

June 2021

PDF Code Project

Abstract

Question answering (QA) in English has been widely explored, but multilingual datasets are relatively new, with several methods attempting to bridge the gap between high- and low-resourced languages using data augmentation through translation and cross-lingual transfer. In this project, we take a step back and study which approaches allow us to take the most advantage of existing resources in order to produce QA systems in many languages. Specifically, we perform extensive analysis to measure the efficacy of few-shot approaches augmented with automatic translations and permutations of context-question-answer pairs. In addition, we make suggestions for future dataset development efforts that make better use of a fixed annotation budget, with a goal of increasing the language coverage of QA datasets and systems.

Type

Conference paper

Publication

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics

multilingual NLP fairness

Antonios Anastasopoulos

Assistant Professor

I work on multilingual models, machine translation, speech recognition, and NLP for under-served languages.

Towards more equitable question answering systems: How much more data do you need?

Abstract

Antonios Anastasopoulos

Assistant Professor

Related