For the app we'll display
1. Ranking
2. Streamer name
3. Streamer image (as of the date the data was collected)
4. Number of followers (as of the date the data was collected)
5. Primary game/category
6. Secondary game/category

We already have 1 and 2 taken care of. We have data for 3-6 but we need to organize it.

## Streamer image
We initially collected the actual images rather than the urls. An upside to this method is that we won't have to run into errors with a photo not being found if it gets removed from the web. However, the downside is that the image folder is unnecessarily large ~0.3GB. We'll instead take the other approach and collect the urls.

In [1]:
import requests
import json
from tqdm import tqdm
import pandas as pd
import config 

client_id=config.client_id
bearer_token = config.bearer_token
def get_streamer_photo_url(user):
    url = "https://api.twitch.tv/helix/users?login=%s" % user.replace(" ", "").lower()
    response = requests.get(url, headers={"client-id":client_id, "authorization":"Bearer %s" % bearer_token})
    try:
        img_url = json.loads(response.text)["data"][0]["profile_image_url"]
    except:
        img_url = "NA"
    return [user, img_url]

def get_all_streamer_photo_urls(usernames):
    data = []
    pbar = tqdm(usernames)
    for user in pbar:
        data.append(get_streamer_photo_url(user))
        pbar.update(1)
    return data
            
with open("data/usernames_for_app.txt", "r") as f:
    usernames = f.read().split("\n")

In [19]:
d = get_all_streamer_photo_urls(usernames)
d[:10]

In [3]:
s = "https://static-cdn.jtvnw.net/jtv_user_pictures/" # all urls begin with this except missing/placeholder images
[x for x in d if x[1][:len(s)] != s][:10]

[['BadBoyHaloIsLive', 'NA'],
 ['TheEret', 'NA'],
 ['FundyLIVE', 'NA'],
 ['RatedEpicz', 'NA'],
 ['skippypoppin',
  'https://static-cdn.jtvnw.net/user-default-pictures-uv/de130ab0-def7-11e9-b668-784f43822e80-profile_image-300x300.png'],
 ['SnaggyMo', 'NA'],
 ['Neytiri', 'NA'],
 ['HappyThoughts0001', 'NA'],
 ['averagejoewo', 'NA'],
 ['Zenon_GP', 'NA']]

In [4]:
# set all missing urls to 1 common placeholder url
placeholder = "https://static-cdn.jtvnw.net/user-default-pictures-uv/de130ab0-def7-11e9-b668-784f43822e80-profile_image-300x300.png"
new_d = [x if x[1][:len(s)]==s else [x[0],placeholder] for x in d]

df_urls = pd.DataFrame(new_d, columns=["name","image_url"])
df_urls.head()

Unnamed: 0,name,image_url
0,Riot Games,https://static-cdn.jtvnw.net/jtv_user_pictures...
1,AustinShow,https://static-cdn.jtvnw.net/jtv_user_pictures...
2,dreamwastaken,https://static-cdn.jtvnw.net/jtv_user_pictures...
3,tommyinnit,https://static-cdn.jtvnw.net/jtv_user_pictures...
4,RocketLeague,https://static-cdn.jtvnw.net/jtv_user_pictures...


## Number of followers

In [5]:
df = pd.read_csv("data/streamer_info_eng.csv")
df = df[df["name"].isin(usernames)]
df.shape

(1904, 27)

In [6]:
df_follows = df[["name","total_followers"]]

## Main & secondary games

In [7]:
def get_main_games(user, n=2):
    df = pd.read_csv("data/games/%s.csv" % user)
    df = df[~df.isna().any(axis=1)]
    if len(df) == 0:
        return ["Unknown"]*n
    else:
        df.loc[df["Game"]=="Unknown", "Game"] = "Unknown"
        df.loc[df["Game"]=="IRL", "Game"] = "Just Chatting"
        
        df["pct_airtime"] = df["Total airtime"].apply(lambda x: x.split("hrs")[1].strip("%")).astype('float')
        df_out = df[["Game","pct_airtime"]].groupby("Game").sum().reset_index().sort_values("pct_airtime", ascending=False)
        list_out = df_out.head(n)["Game"].values.tolist()
        if len(list_out) < n:
            list_out.extend(["Unknown"]*(n-len(list_out)))
        return list_out

In [8]:
game_data = []
for user in usernames:
    game_data.append([user] + get_main_games(user, n=2))

In [9]:
df_games = pd.DataFrame(game_data, columns=["name","game1", "game2"])
df_games.head()

Unnamed: 0,name,game1,game2
0,Riot Games,League of Legends,Teamfight Tactics
1,AustinShow,Just Chatting,Arma 3
2,dreamwastaken,Minecraft,Just Chatting
3,tommyinnit,Minecraft,Just Chatting
4,RocketLeague,Rocket League,Unknown


# Putting it all together

In [10]:
df_urls.head()

Unnamed: 0,name,image_url
0,Riot Games,https://static-cdn.jtvnw.net/jtv_user_pictures...
1,AustinShow,https://static-cdn.jtvnw.net/jtv_user_pictures...
2,dreamwastaken,https://static-cdn.jtvnw.net/jtv_user_pictures...
3,tommyinnit,https://static-cdn.jtvnw.net/jtv_user_pictures...
4,RocketLeague,https://static-cdn.jtvnw.net/jtv_user_pictures...


In [11]:
df_follows.head()

Unnamed: 0,name,total_followers
1,2BCSuperb,53531
2,360Chrism,116402
3,39daph,653858
4,4Conner,35469
6,5uppp,238878


In [12]:
df_games.head()

Unnamed: 0,name,game1,game2
0,Riot Games,League of Legends,Teamfight Tactics
1,AustinShow,Just Chatting,Arma 3
2,dreamwastaken,Minecraft,Just Chatting
3,tommyinnit,Minecraft,Just Chatting
4,RocketLeague,Rocket League,Unknown


In [13]:
df_merged = df_urls.merge(df_follows, how="inner", on="name").merge(df_games, how="inner", on="name")
df_merged.head()

Unnamed: 0,name,image_url,total_followers,game1,game2
0,Riot Games,https://static-cdn.jtvnw.net/jtv_user_pictures...,5003571,League of Legends,Teamfight Tactics
1,AustinShow,https://static-cdn.jtvnw.net/jtv_user_pictures...,1059609,Just Chatting,Arma 3
2,dreamwastaken,https://static-cdn.jtvnw.net/jtv_user_pictures...,1783388,Minecraft,Just Chatting
3,tommyinnit,https://static-cdn.jtvnw.net/jtv_user_pictures...,1281150,Minecraft,Just Chatting
4,RocketLeague,https://static-cdn.jtvnw.net/jtv_user_pictures...,1930966,Rocket League,Unknown


In [14]:
# custom formatting of followers 
def format_num(num):
    assert num < 10**9, print("Number too large") # pretty sure we don't need to format billions
    if num >= 10**6:
        m = str(num//10**6)
        t = str(num)[len(m)]
        return m + "." + t + "M"
    elif num >= 10**3:
        t = str(num//10**3)
        tt = str(num)[len(t)]
        return t + "." + tt + "K"
    else:
        return str(num)

In [15]:
df_merged["total_followers"] = df_merged["total_followers"].map(format_num)

In [16]:
df_merged.head()

Unnamed: 0,name,image_url,total_followers,game1,game2
0,Riot Games,https://static-cdn.jtvnw.net/jtv_user_pictures...,5.0M,League of Legends,Teamfight Tactics
1,AustinShow,https://static-cdn.jtvnw.net/jtv_user_pictures...,1.0M,Just Chatting,Arma 3
2,dreamwastaken,https://static-cdn.jtvnw.net/jtv_user_pictures...,1.7M,Minecraft,Just Chatting
3,tommyinnit,https://static-cdn.jtvnw.net/jtv_user_pictures...,1.2M,Minecraft,Just Chatting
4,RocketLeague,https://static-cdn.jtvnw.net/jtv_user_pictures...,1.9M,Rocket League,Unknown


In [17]:
df_merged.to_csv("data/appdata.csv", index=False)

In [18]:
# an example of what the data we'll pass to app would look like if the top recommendations were AustinShow and RiotGames
data_for_app = enumerate( df_merged.set_index("name").loc[["AustinShow", "Riot Games"]].reset_index().values, start=1 )
for data in data_for_app:
    print(data)

(1, array(['AustinShow',
       'https://static-cdn.jtvnw.net/jtv_user_pictures/9e894c05-6131-4414-bf01-a65e9f88b13a-profile_image-300x300.png',
       '1.0M', 'Just Chatting', 'Arma 3'], dtype=object))
(2, array(['Riot Games',
       'https://static-cdn.jtvnw.net/jtv_user_pictures/889e7697-b636-48d9-be15-a9a39e286a64-profile_image-300x300.png',
       '5.0M', 'League of Legends', 'Teamfight Tactics'], dtype=object))


There is one tuple per recommendation. The first entry is the ranking and the second is all of the data about the recommendation (username, image url, number of followers, game 1, game 2).