Dataset Maps - WikiANN (all)
afrikaans
Relevant Statistics
Percentage in-country: 34.71%
Missing countries: 85 of 243 (34.98%)
Total Variation Distance between observed and population-proportional distribution: 34.966
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.155
Variance explained by GDP: 0.460
Variance explained by geographic distance: 0.002
Variance explained by all 3 factors: 0.502
arabic
Relevant Statistics
Percentage in-country: 29.68%
Missing countries: 51 of 243 (20.99%)
Total Variation Distance between observed and population-proportional distribution: 1587.836
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.276
Variance explained by GDP: 0.402
Variance explained by geographic distance: 0.201
Variance explained by all 3 factors: 0.555
azerbaijani
Relevant Statistics
Percentage in-country: 33.94%
Missing countries: 92 of 243 (37.86%)
Total Variation Distance between observed and population-proportional distribution: 1268.119
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.220
Variance explained by GDP: 0.407
Variance explained by geographic distance: 0.224
Variance explained by all 3 factors: 0.547
bulgarian
Relevant Statistics
Percentage in-country: 14.46%
Missing countries: 46 of 243 (18.93%)
Total Variation Distance between observed and population-proportional distribution: 0.000
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.239
Variance explained by GDP: 0.396
Variance explained by geographic distance: 0.161
Variance explained by all 3 factors: 0.501
bengali
Relevant Statistics
Percentage in-country: 26.91%
Missing countries: 82 of 243 (33.74%)
Total Variation Distance between observed and population-proportional distribution: 662.371
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.325
Variance explained by GDP: 0.433
Variance explained by geographic distance: 0.027
Variance explained by all 3 factors: 0.445
german
Relevant Statistics
Percentage in-country: 27.03%
Missing countries: 41 of 243 (16.87%)
Total Variation Distance between observed and population-proportional distribution: 430.165
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.244
Variance explained by GDP: 0.484
Variance explained by geographic distance: 0.181
Variance explained by all 3 factors: 0.589
greek
Relevant Statistics
Percentage in-country: 27.68%
Missing countries: 42 of 243 (17.28%)
Total Variation Distance between observed and population-proportional distribution: 5.759
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.112
Variance explained by GDP: 0.337
Variance explained by geographic distance: 0.176
Variance explained by all 3 factors: 0.472
spanish
Relevant Statistics
Percentage in-country: 54.58%
Missing countries: 44 of 243 (18.11%)
Total Variation Distance between observed and population-proportional distribution: 7674.883
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.240
Variance explained by GDP: 0.435
Variance explained by geographic distance: 0.090
Variance explained by all 3 factors: 0.475
estonian
Relevant Statistics
Percentage in-country: 21.08%
Missing countries: 50 of 243 (20.58%)
Total Variation Distance between observed and population-proportional distribution: 0.000
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.200
Variance explained by GDP: 0.405
Variance explained by geographic distance: 0.210
Variance explained by all 3 factors: 0.546
basque
Relevant Statistics
Percentage in-country: 23.53%
Missing countries: 97 of 243 (39.92%)
Total Variation Distance between observed and population-proportional distribution: 0.000
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.212
Variance explained by GDP: 0.489
Variance explained by geographic distance: 0.096
Variance explained by all 3 factors: 0.550
finnish
Relevant Statistics
Percentage in-country: 17.22%
Missing countries: 46 of 243 (18.93%)
Total Variation Distance between observed and population-proportional distribution: 0.000
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.248
Variance explained by GDP: 0.478
Variance explained by geographic distance: 0.172
Variance explained by all 3 factors: 0.572
french
Relevant Statistics
Percentage in-country: 33.50%
Missing countries: 40 of 243 (16.46%)
Total Variation Distance between observed and population-proportional distribution: 5967.332
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.254
Variance explained by GDP: 0.487
Variance explained by geographic distance: 0.169
Variance explained by all 3 factors: 0.574
hebrew
Relevant Statistics
Percentage in-country: 17.49%
Missing countries: 53 of 243 (21.81%)
Total Variation Distance between observed and population-proportional distribution: 23.197
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.189
Variance explained by GDP: 0.412
Variance explained by geographic distance: 0.170
Variance explained by all 3 factors: 0.529
hungarian
Relevant Statistics
Percentage in-country: 20.35%
Missing countries: 52 of 243 (21.40%)
Total Variation Distance between observed and population-proportional distribution: 449.676
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.186
Variance explained by GDP: 0.421
Variance explained by geographic distance: 0.185
Variance explained by all 3 factors: 0.532
indonesian
Relevant Statistics
Percentage in-country: 11.77%
Missing countries: 42 of 243 (17.28%)
Total Variation Distance between observed and population-proportional distribution: 0.000
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.357
Variance explained by GDP: 0.490
Variance explained by geographic distance: 0.006
Variance explained by all 3 factors: 0.482
japanese
Relevant Statistics
Percentage in-country: 66.02%
Missing countries: 69 of 243 (28.40%)
Total Variation Distance between observed and population-proportional distribution: 0.000
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.357
Variance explained by GDP: 0.582
Variance explained by geographic distance: 0.022
Variance explained by all 3 factors: 0.555
korean
Relevant Statistics
Percentage in-country: 25.06%
Missing countries: 45 of 243 (18.52%)
Total Variation Distance between observed and population-proportional distribution: 1378.067
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.333
Variance explained by GDP: 0.524
Variance explained by geographic distance: 0.022
Variance explained by all 3 factors: 0.483
marathi
Relevant Statistics
Percentage in-country: 52.83%
Missing countries: 99 of 243 (40.74%)
Total Variation Distance between observed and population-proportional distribution: 0.000
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.359
Variance explained by GDP: 0.510
Variance explained by geographic distance: 0.033
Variance explained by all 3 factors: 0.482
russian
Relevant Statistics
Percentage in-country: 24.63%
Missing countries: 46 of 243 (18.93%)
Total Variation Distance between observed and population-proportional distribution: 843.995
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.234
Variance explained by GDP: 0.424
Variance explained by geographic distance: 0.223
Variance explained by all 3 factors: 0.557
swahili
Relevant Statistics
Percentage in-country: 11.33%
Missing countries: 134 of 243 (55.14%)
Total Variation Distance between observed and population-proportional distribution: 35.505
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.152
Variance explained by GDP: 0.364
Variance explained by geographic distance: 0.032
Variance explained by all 3 factors: 0.395
telugu
Relevant Statistics
Percentage in-country: 81.58%
Missing countries: 182 of 243 (74.90%)
Total Variation Distance between observed and population-proportional distribution: 0.000
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.192
Variance explained by GDP: 0.345
Variance explained by geographic distance: 0.039
Variance explained by all 3 factors: 0.155
thai
Relevant Statistics
Percentage in-country: 38.26%
Missing countries: 161 of 243 (66.26%)
Total Variation Distance between observed and population-proportional distribution: 21.969
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.276
Variance explained by GDP: 0.517
Variance explained by geographic distance: 0.090
Variance explained by all 3 factors: 0.548
turkish
Relevant Statistics
Percentage in-country: 28.79%
Missing countries: 52 of 243 (21.40%)
Total Variation Distance between observed and population-proportional distribution: 1071.450
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.222
Variance explained by GDP: 0.447
Variance explained by geographic distance: 0.227
Variance explained by all 3 factors: 0.597
vietnamese
Relevant Statistics
Percentage in-country: 33.29%
Missing countries: 57 of 243 (23.46%)
Total Variation Distance between observed and population-proportional distribution: 3702.792
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.317
Variance explained by GDP: 0.540
Variance explained by geographic distance: 0.045
Variance explained by all 3 factors: 0.533
yoruba
Relevant Statistics
Percentage in-country: 20.77%
Missing countries: 215 of 243 (88.48%)
Total Variation Distance between observed and population-proportional distribution: 0.000
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: -0.049
Variance explained by GDP: 0.029
Variance explained by geographic distance: 0.016
Variance explained by all 3 factors: 0.122
chinese
Relevant Statistics
Percentage in-country: 49.11%
Missing countries: 49 of 243 (20.16%)
Total Variation Distance between observed and population-proportional distribution: 4044.378
We also trained a linear model to find socioeconomic correlates of the datasets
Variance explained by population: 0.387
Variance explained by GDP: 0.575
Variance explained by geographic distance: 0.070
Variance explained by all 3 factors: 0.573