Dataset Maps - SQuAD (English)

Back to all dataset maps

Relevant Statistics

Percentage in-country: 62.73%
Missing countries: 93 of 243 (38.27%)
Total Variation Distance between observed and population-proportional distribution: 5004.444

We also trained a linear model to find socioeconomic correlates of the datasets
    Variance explained by population: 0.277
    Variance explained by GDP: 0.517
    Variance explained by geographic distance: 0.062
    Variance explained by all 3 factors: 0.534







Antonios Anastasopoulos
Antonios Anastasopoulos
Assistant Professor

I work on multilingual models, machine translation, speech recognition, and NLP for under-served languages.

Fahim Faisal
Fahim Faisal
PhD Student

My name is Fahim Faisal. My academic interest involves learning different aspects of computational linguistics and natural language processing (eg. machine translation). Currently, I am working on a project related to semi-supervised learning of morphological process of language.

Related