# r/place Color Propensity by Country

This notebook shows how to preprocess data from the [r/Place dataset](https://www.reddit.com/r/redditdata/comments/6640ru/place_datasets_april_fools_2017/) published by reddit to create this [map showing color propensity by country](https://ramiro.org/map/world/rplace-country-color-propensity/).

In [1]:
%load_ext signature

import os

import pandas as pd
import geonamescache

data_dir = os.path.expanduser('~/data')
gc = geonamescache.GeonamesCache()
df = pd.read_csv(os.path.join(data_dir, 'reddit', 'rplace-country-color-propensity.csv'))

df.head()

Unnamed: 0,iso_country_code,color_0,color_1,color_2,color_3,color_4,color_5,color_6,color_7,color_8,color_9,color_10,color_11,color_12,color_13,color_14,color_15
0,AD,21,13,6,71,0,0,0,0,0,0,0,0,0,5,0,0
1,AF,5,0,0,7,0,0,0,0,6,0,0,0,0,0,0,0
2,AG,16,0,0,25,0,16,22,0,0,0,0,0,0,0,0,0
3,AI,0,0,0,21,0,0,0,0,0,0,0,0,0,0,0,0
4,AO,11,0,0,7,0,0,0,0,0,0,0,0,0,23,0,0


## Add ISO 3 country codes

In [2]:
df_map = df.dropna().copy()
names = gc.get_countries()
df_map['iso3'] = df_map['iso_country_code'].apply(lambda x: names[x]['iso3'] if x in names else None)

## Non-country ISO 2 codes used by reddit

In [3]:
df_map[df_map['iso3'].isnull()]

Unnamed: 0,iso_country_code,color_0,color_1,color_2,color_3,color_4,color_5,color_6,color_7,color_8,color_9,color_10,color_11,color_12,color_13,color_14,color_15,iso3
88,AP,71,8,0,95,0,44,17,21,50,9,11,12,14,65,0,0,
122,EU,403,28,44,713,124,452,178,46,260,56,140,45,42,541,11,38,
175,A1,1433,54,142,3523,376,1171,2207,84,924,181,467,258,189,879,89,269,


## Drop non-countries and determine most used color

In [4]:
df_map.dropna(inplace=True)
df_map['top_color'] = df_map._get_numeric_data().idxmax(axis='columns').apply(lambda x: x.replace('color_', ''))

In [5]:
df_map[['iso3', 'top_color']].to_csv('./data/rplace-country-color-propensity.csv', index=False)

In [6]:
signature