multilingual NLP | George Mason NLP

Phylogeny-Inspired Adaptation of Multilingual Models to New Languages

Large pretrained multilingual models, trained on dozens of languages, have delivered promising results due to cross-lingual learning capabilities on variety of language tasks. Further adapting these models to specific languages, especially ones …

The GMU System Submission for the SUMEval 2022 Shared Task

This paper describes the submission of our multilingual NLP model performance evaluation system for the SUMEval 2022 shared task, a system for predict the performance of a model on a set of target languages. The system is based on the LITMUS model …

Dataset Geography: Mapping Language Data to Language Users

As language technologies become more ubiquitous, there are increasing efforts towards expanding the language diversity and coverage of natural language processing (NLP) systems. Arguably, the most important factor influencing the quality of modern …

Revisiting the Effects of Leakage on Dependency Parsing

Recent work by Søgaard (2020) showed that, treebank size aside, overlap between training and test graphs (termed leakage) explains more of the observed variation in dependency parsing performance than other explanations. In this work we revisit this …

Systematic Inequalities in Language Technology Performance across the World’s Languages

Natural language processing (NLP) systems have become a central technology in communication, education, medicine, artificial intelligence, and many other domains of research and development. While the performance of NLP methods has grown enormously …

When is Wall a Pared and when a Muro?: Extracting Rules Governing Lexical Selection

Evaluating the Morphosyntactic Well-formedness of Generated Texts

Text generation systems are ubiquitous in natural language processing applications. However, evaluation of these systems remains a challenge, especially in multilingual settings. In this paper, we propose L'AMBRE -- a metric to evaluate the …

Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering

Human knowledge is collectively encoded in the roughly 6500 languages spoken around the world, but it is not distributed equally across languages. Hence, for information-seeking question answering (QA) systems to adequately serve speakers of all …

SD-QA: Spoken Dialectal Question Answering for the Real World

Question answering (QA) systems are now available through numerous commercial applications for a wide variety of domains, serving millions of users that interact with them via speech interfaces. However, current benchmarks in QA research do not …

Cross-Lingual Text Classification of Transliterated Hindi and Malayalam

Transliteration is very common on social media, but transliterated text is not adequately handled by modern neural models for various NLP tasks. In this work, we combine data augmentation approaches with a Teacher-Student training scheme to address …