`Introduction`

Video games have become very important in modern entertainment. From simple pixelated games to detailed reallistic games. Millions of people play video games regularly. There is a wide range of video games with different qualities. Some video games seem to be very popular, while others don't get much attention. These video games also come with a wide range of prices. Some video games are categorized as "triple A" games, which means that they should be high quality and thus come with a high price. A lot of other video games are categorized as "indie" games, which means they have a lower budget and are much more simple. 

In this data story we are going to explore the difference between triple A games and indie games. We are going to focus on the price and succes of video games. With the rising prices of video games our main question is: "Does the price of a video game influence the succes of a video game?" We define the succes of a video game in different factors such as sales, players and review scores. We will be comparing these factors to the price for both tiple A games and indie games. Our first perspective is that triple A games should be more succesfull than indie games since they have a higher budget, thus the quality should be better. Our second perspective is that indie games are cheaper and require less computational power so more people are able to afford and play the video game.

In [1]:
# Import packages
import pandas as pd
import plotly.express as px
import plotly.graph_objs as go
import seaborn as sns
import numpy as np

### Other Variables

Ofcourse there are a lot of reason why a game does better than others. Next will look at some variables that might be interesting

In [2]:
df1 = pd.read_csv('DATASETS/result.csv')
df1['name'] = df1['name'].str.lower().replace(r'\s+','',regex=True).replace(r':','',regex=True).replace(r'-','',regex=True)

df2 = pd.read_csv('DATASETS/vgsales.csv')
df2['name'] = df2['name'].str.lower().replace(r'\s+','',regex=True).replace(r':','',regex=True).replace(r'-','',regex=True)

df3 = pd.read_csv('DATASETS/steam.csv')
df3['name'] = df3['name'].str.lower().replace(r'\s+','',regex=True).replace(r':','',regex=True).replace(r'-','',regex=True)


df1.set_index('name', inplace=True)
df2.set_index('name', inplace=True)
df3.set_index('name', inplace=True)

com_df = df1.join(df3, on='name', how='inner').reset_index()

# pos neg ratio rating
com_df['ratio_rating'] = com_df['positive_ratings'] / (com_df['negative_ratings'] + com_df['positive_ratings'])

ratio_se = pd.qcut(com_df['ratio_rating'], q=5, retbins=True, labels=['0-20%','21-40%','41-60%','61-80%','81-100%'])
com_df['ratio_bins'] = ratio_se[0]

# metascore 
meta_se = pd.qcut(com_df['metascore'], q=5, retbins=True, labels=['very low','low','medium','high','very high'])
com_df['meta_bins'] = meta_se[0]

# owners
owners_bins = {
    '0-20000':'<50k',
    '20000-50000':'<50k',
    '50000-100000':'50-200k',
    '100000-200000':'50-200k',
    '200000-500000':'200k-1m',
    '500000-1000000':'200k-1m',
    '1000000-2000000':'1-5m',
    '2000000-5000000':'1-5m',
    '5000000-10000000':'>5m',
    '10000000-20000000':'>5m',
    '20000000-50000000':'>5m',
    '50000000-100000000':'>5m',
    '100000000-200000000':'>5m',
}
com_df['owners_bins'] = com_df['owners'].map(owners_bins,)

# consoles 
console_bins = {
    'PC':'PC',
    'PS4':'PSN',
    'XONE':'XBOX',
    'X360':'XBOX',
    'Switch':'Nintendo',
    'PS3':'PSN',
    'WIIU':'Nintendo',
    'XBOX':'XBOX',
    'VITA':'PSN',
    'PS2':'PSN',
    'WII':'Nintendo',
    'PSP':'PSN',
    '3DS':'Nintendo',
    'DS':'Nintendo',
    'GC':'Nintendo',
    'GBA':'Nintendo',
    'PS':'PSN',
    'DC':'Nintendo',
}
com_df['console_bins'] = com_df['console'].map(console_bins,)

# price
price_se = pd.cut(com_df['price'],[-1,5,10,20,50,500],labels=['<5','5-10','10-20','20-50','>50'],retbins=True)
com_df['price_bins'] = price_se[0]

# playtime
playt_se = pd.cut(com_df['average_playtime'],[-1,5,20,200,500,100000],labels=['<5','5-20','20-200','200-500','>500'],retbins=True)
com_df['playt_bins'] = playt_se[0]

vlow = '#ff6f9f'
low = '#ffcf6f'
medium = 'white'
high = '#6f9fff'
vhigh = '#6fffcf'

meta_colors = {
    'very low':vlow,
    'low':low,
    'medium':medium,
    'high':high,
    'very high':vhigh
}

# com_df.sort_values(by='metascore',inplace=True,ascending=False)

data = go.Parcats(
    dimensions=[
        {'label':'Playtime',
         'values':com_df['playt_bins'],},
        
        {'label':'Price',
         'values':com_df['price_bins'],
         'categoryorder': 'array','categoryarray':['<5','5-10','10-20','20-50','>50']},
        
        {'label':'Console',
         'values':com_df['console_bins']},

        {'label':'Owners',
         'values':com_df['owners_bins'],
         'categoryorder': 'array', 'categoryarray':['<50k','50-200k','200k-1m','1-5m','>5m']},
        
        {'label':'Ratio',
         'values':com_df['ratio_bins'],
         'categoryorder': 'category descending'},
        
        {'label':'Metascore',
         'values':com_df['meta_bins'],
         'categoryorder': 'array', 'categoryarray':['very high','high','medium','low','very low']}
    ],
    sortpaths='backward',
    line={'color' : [meta_colors[meta] for meta in com_df['meta_bins']]}, # easy
    # line={'color':com_df['metascore'],
    #       'colorscale':[[0, vlow],[.55, vlow],[0.55, low],[0.64, low],[0.64, medium],[0.70, medium],[0.70,high],[0.78,high],[.78,vhigh],[1,vhigh]]}
)

layout = go.Layout(
    title = 'Analysis of metascores for different variables',
    paper_bgcolor = '#E0E0E0',
    font={'color':'#003B4A'}
)

go.Figure(data=data, layout=layout).show()

average_playtime
0       1315
1         32
20        19
3         16
28        15
        ... 
756        1
2147       1
693        1
443        1
586        1
Name: count, Length: 795, dtype: int64
