Dataset Maps - X-FACTR (12 languages)

Back to all dataset maps

greek

Relevant Statistics

Percentage in-country: 2.94%
Total Variation Distance between observed and population-proportional distribution: 1.623

We also trained a linear model to find socioeconomic correlates of the datasets
    Variance explained by population: 0.282
    Variance explained by GDP: 0.466
    Variance explained by geographic distance: 0.145
    Variance explained by all 3 factors: 0.562


yoruba

Relevant Statistics

Percentage in-country: 1.15%
Total Variation Distance between observed and population-proportional distribution: 0.000

We also trained a linear model to find socioeconomic correlates of the datasets
    Variance explained by population: 0.296
    Variance explained by GDP: 0.495
    Variance explained by geographic distance: 0.058
    Variance explained by all 3 factors: 0.543


french

Relevant Statistics

Percentage in-country: 16.34%
Total Variation Distance between observed and population-proportional distribution: 4078.705

We also trained a linear model to find socioeconomic correlates of the datasets
    Variance explained by population: 0.314
    Variance explained by GDP: 0.480
    Variance explained by geographic distance: 0.129
    Variance explained by all 3 factors: 0.560


bengali

Relevant Statistics

Percentage in-country: 11.46%
Total Variation Distance between observed and population-proportional distribution: 387.175

We also trained a linear model to find socioeconomic correlates of the datasets
    Variance explained by population: 0.414
    Variance explained by GDP: 0.512
    Variance explained by geographic distance: 0.069
    Variance explained by all 3 factors: 0.554


hebrew

Relevant Statistics

Percentage in-country: 2.12%
Total Variation Distance between observed and population-proportional distribution: 20.073

We also trained a linear model to find socioeconomic correlates of the datasets
    Variance explained by population: 0.292
    Variance explained by GDP: 0.489
    Variance explained by geographic distance: 0.142
    Variance explained by all 3 factors: 0.583


hungarian

Relevant Statistics

Percentage in-country: 1.97%
Total Variation Distance between observed and population-proportional distribution: 222.439

We also trained a linear model to find socioeconomic correlates of the datasets
    Variance explained by population: 0.295
    Variance explained by GDP: 0.494
    Variance explained by geographic distance: 0.173
    Variance explained by all 3 factors: 0.606


korean

Relevant Statistics

Percentage in-country: 0.84%
Total Variation Distance between observed and population-proportional distribution: 25.135

We also trained a linear model to find socioeconomic correlates of the datasets
    Variance explained by population: 0.338
    Variance explained by GDP: 0.511
    Variance explained by geographic distance: 0.032
    Variance explained by all 3 factors: 0.492


marathi

Relevant Statistics

Percentage in-country: 11.15%
Total Variation Distance between observed and population-proportional distribution: 0.000

We also trained a linear model to find socioeconomic correlates of the datasets
    Variance explained by population: 0.391
    Variance explained by GDP: 0.536
    Variance explained by geographic distance: 0.074
    Variance explained by all 3 factors: 0.568


russian

Relevant Statistics

Percentage in-country: 4.34%
Total Variation Distance between observed and population-proportional distribution: 208.369

We also trained a linear model to find socioeconomic correlates of the datasets
    Variance explained by population: 0.299
    Variance explained by GDP: 0.470
    Variance explained by geographic distance: 0.192
    Variance explained by all 3 factors: 0.587


spanish

Relevant Statistics

Percentage in-country: 40.71%
Total Variation Distance between observed and population-proportional distribution: 10002.477

We also trained a linear model to find socioeconomic correlates of the datasets
    Variance explained by population: 0.311
    Variance explained by GDP: 0.478
    Variance explained by geographic distance: 0.099
    Variance explained by all 3 factors: 0.539


turkish

Relevant Statistics

Percentage in-country: 7.55%
Total Variation Distance between observed and population-proportional distribution: 780.841

We also trained a linear model to find socioeconomic correlates of the datasets
    Variance explained by population: 0.311
    Variance explained by GDP: 0.484
    Variance explained by geographic distance: 0.171
    Variance explained by all 3 factors: 0.599


vietnamese

Relevant Statistics

Percentage in-country: 17.45%
Total Variation Distance between observed and population-proportional distribution: 1855.066

We also trained a linear model to find socioeconomic correlates of the datasets
    Variance explained by population: 0.356
    Variance explained by GDP: 0.516
    Variance explained by geographic distance: 0.022
    Variance explained by all 3 factors: 0.504







Antonios Anastasopoulos
Antonios Anastasopoulos
Assistant Professor

I work on multilingual models, machine translation, speech recognition, and NLP for under-served languages.

Fahim Faisal
Fahim Faisal
PhD Student

My name is Fahim Faisal. My academic interest involves learning different aspects of computational linguistics and natural language processing (eg. machine translation). Currently, I am working on a project related to semi-supervised learning of morphological process of language.

Related