Dataset Maps - Natural Questions (English)

Back to all dataset maps

Relevant Statistics

Percentage in-country: 80.07%
Missing countries: 49 of 243 (20.16%)
Total Variation Distance between observed and population-proportional distribution: 11907.219

We also trained a linear model to find socioeconomic correlates of the datasets
    Variance explained by population: 0.395
    Variance explained by GDP: 0.535
    Variance explained by geographic distance: 0.030
    Variance explained by all 3 factors: 0.550







Antonios Anastasopoulos
Antonios Anastasopoulos
Assistant Professor

I work on multilingual models, machine translation, speech recognition, and NLP for under-served languages.

Fahim Faisal
Fahim Faisal
PhD Student

My name is Fahim Faisal. My academic interest involves learning different aspects of computational linguistics and natural language processing (eg. machine translation). Currently, I am working on a project related to semi-supervised learning of morphological process of language.

Related