Dataset Maps - MLQA (English)

Back to all dataset maps

Relevant Statistics

Percentage in-country: 53.63%
Missing countries: 80 of 243 (32.92%)
Total Variation Distance between observed and population-proportional distribution: 913.093

We also trained a linear model to find socioeconomic correlates of the datasets
    Variance explained by population: 0.317
    Variance explained by GDP: 0.561
    Variance explained by geographic distance: 0.040
    Variance explained by all 3 factors: 0.548







Antonios Anastasopoulos
Antonios Anastasopoulos
Assistant Professor

I work on multilingual models, machine translation, speech recognition, and NLP for under-served languages.

Fahim Faisal
Fahim Faisal
PhD Student

My name is Fahim Faisal. My academic interest involves learning different aspects of computational linguistics and natural language processing (eg. machine translation). Currently, I am working on a project related to semi-supervised learning of morphological process of language.

Related