Fairness

Antonios Anastasopoulos

Aug 2, 2020

Advances in natural language processing (NLP) technology now make it possible to perform many tasks through natural language or over natural language data – automatic systems can answer questions, perform web search, or command our computers to perform specific tasks. However, “language” is not monolithic; people vary in the language they speak, the dialect they use, the relative ease with which they produce language, or the words they choose with which to express themselves. In benchmarking of NLP systems however, this linguistic variety is generally unattested. Most commonly tasks are formulated using canonical American English, designed with little regard for whether systems will work on language of any other variety. In this work we ask a simple question: can we measure the extent to which the diversity of language that we use affects the quality of results that we can expect from language technology systems? This will allow for the development and deployment of fair accuracy measures for a variety of tasks regarding language technology, encouraging advances in the state of the art in these technologies to focus on all, not just a select few. Funded by Amazon and the NSF through the NSF-FAI program.

fairness

Antonios Anastasopoulos

Assistant Professor

I work on multilingual models, machine translation, speech recognition, and NLP for under-served languages.

Publications

Are Large Language Models Geospatially Knowledgeable?

Despite the impressive performance of Large Language Models (LLM) for various natural language processing tasks, little is known about …

Prabin Bhandari, Antonios Anastasopoulos, Dieter Pfoser

PDF Code Project

Are Large Language Models Geospatially Knowledgeable?

Global Voices, Local Biases: Socio-Cultural Prejudices across Languages

Human biases are ubiquitous but not uniform: disparities exist across linguistic, cultural, and societal borders. As large amounts of …

Anjishnu Mukherjee, Chahat Raj, Ziwei Zhu, Antonios Anastasopoulos

Code Dataset Project

Global Voices, Local Biases: Socio-Cultural Prejudices across Languages

Geographic and Geopolitical Biases of Language Models

Pretrained language models (PLMs) often fail to fairly represent target users from certain world regions because of the …

Fahim Faisal, Antonios Anastasopoulos

PDF Project

Geographic and Geopolitical Biases of Language Models

Phylogeny-Inspired Adaptation of Multilingual Models to New Languages

Large pretrained multilingual models, trained on dozens of languages, have delivered promising results due to cross-lingual learning …

Fahim Faisal, Antonios Anastasopoulos

PDF Code Dataset Project

Phylogeny-Inspired Adaptation of Multilingual Models to New Languages

The GMU System Submission for the SUMEval 2022 Shared Task

This paper describes the submission of our multilingual NLP model performance evaluation system for the SUMEval 2022 shared task, a …

Syeda Sabrina Akter, Antonios Anastasopoulos

PDF Code Project

The GMU System Submission for the SUMEval 2022 Shared Task

Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey

Recent advances in the capacity of large language models to generate human-like text have resulted in their increased adoption in …

Sachin Kumar, Vidhisha Balachandran, Lucille Njoo, Antonios Anastasopoulos, Yulia Tsvetkov

PDF Project

Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey

Systematic Inequalities in Language Technology Performance across the World’s Languages

Natural language processing (NLP) systems have become a central technology in communication, education, medicine, artificial …

Damián Blasi, Antonios Anastasopoulos, Graham Neubig

PDF Code Project

Systematic Inequalities in Language Technology Performance across the World’s Languages

Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering

Human knowledge is collectively encoded in the roughly 6500 languages spoken around the world, but it is not distributed equally across …

Fahim Faisal, Antonios Anastasopoulos

PDF Code Dataset Project

Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering

SD-QA: Spoken Dialectal Question Answering for the Real World

Question answering (QA) systems are now available through numerous commercial applications for a wide variety of domains, serving …

Fahim Faisal, Sharlina Keshava, Md Mahfuz Ibn Alam, Antonios Anastasopoulos

PDF Code Dataset Project Project

SD-QA: Spoken Dialectal Question Answering for the Real World

Machine Translation into Low-resource Language Varieties

State-of-the-art machine translation (MT) systems are typically trained to generate “standard” target language; however, …

Sachin Kumar, Antonios Anastasopoulos, Shuly Wintner, Yulia Tsvetkov

PDF Code Project

Machine Translation into Low-resource Language Varieties

Towards more equitable question answering systems: How much more data do you need?

Question answering (QA) in English has been widely explored, but multilingual datasets are relatively new, with several methods …

Arnab Debnath, Navid Rajabi, Fardina Fathmiul Alam, Antonios Anastasopoulos

PDF Code Project

Towards more equitable question answering systems: How much more data do you need?