# Orvin Tritama

## Research question/interests

With the conclusion in our team's discussion, my team and I will be analyzing dataset in the video gaming industry. With the everchanging nature of technology, video game industries have been changing in terms of the services and subscriptions that they use to develop and offer to the users using the updated tools and technologies in the field. With the rapid growth in the industry, I am interested in analyzing the shift in trend of platforms used in the video gaming industry, ranging from the retro-gaming to modernized-gaming consoles overtime from 1995 to 2020.

In [1]:
import pandas as pd
import seaborn as sns
import numpy as np
import datetime

## Loading Data

Here my analysis data is loaded from the path as passed in read_csv. Here you can see the first 5 and the 5 elements of the (partially) cleaned data.
*Note: Data is (partially) cleaned, all the columns that are unused by all team members have been removed, however as I progressed in my analysis, I will be cleaning the rest of the data that is not used for my purposes*

In [2]:
df = pd.read_csv (r'../data/processed/games-data-processed.csv')
print (df.head(n=5))
print(df.tail(n=5))

                                   name      platform     r-date  \
0  The Legend of Zelda: Ocarina of Time    Nintendo64  23-Nov-98   
1              Tony Hawk's Pro Skater 2   PlayStation  20-Sep-00   
2                   Grand Theft Auto IV  PlayStation3  29-Apr-08   
3                           SoulCalibur     Dreamcast  08-Sep-99   
4                   Grand Theft Auto IV       Xbox360  29-Apr-08   

                developer                                      genre   players  
0                Nintendo                   Action Adventure,Fantasy  1 Player  
1  NeversoftEntertainment           Sports,Alternative,Skateboarding    02-Jan  
2           RockstarNorth  Action Adventure,Modern,Modern,Open-World  1 Player  
3                   Namco                         Action,Fighting,3D    02-Jan  
4           RockstarNorth  Action Adventure,Modern,Modern,Open-World  1 Player  
                                               name      platform     r-date  \
17939                    

## Cleaning Data
In this part, I will remove most of the columns that are not used in my data analysis.  
To answer my research question focusing on the trend of gaming platform overtime, we will focus only on two columns from the dataset, which are **platform** and **r-date**. However, I will keep the 'name' column in my analysis for the purposes of readibility and comprehension of our dataset for the user. 

In [3]:
orvin_df = df.drop(columns=['genre','players', 'developer'])

print('Here is a sample of the first five rows in my cleaned data\n')
print(orvin_df.head(n=5))

Here is a sample of the first five rows in my cleaned data

                                   name      platform     r-date
0  The Legend of Zelda: Ocarina of Time    Nintendo64  23-Nov-98
1              Tony Hawk's Pro Skater 2   PlayStation  20-Sep-00
2                   Grand Theft Auto IV  PlayStation3  29-Apr-08
3                           SoulCalibur     Dreamcast  08-Sep-99
4                   Grand Theft Auto IV       Xbox360  29-Apr-08


## Process Data
In this part, I will process some of data mainly for the **platform** and **r-date**.  
1. For the platform, For easier readibility, I will add a space in between any platform names that contain characters and number in it. For example: Nintendo64 to Nintendo 64, XBOX360 to XBOX 360 and so on. 

2. For the r-date, I will convert the months of the date into using numbers, i.e: January=1, February=2, and so on. Also, I will convert the date to follow Canadian Date Format which is {YYYY-MM-DD} from the given {DD-Month-YY'} format. Finally, this date will be converted to a datetime object using `.dt.date` to remove the timezone information and HH:MM:SS

3. Since I will be looking at the trend of the changes of platform overtime, I'm interested in finding the days the platform are created since 1998. I will create a new column that will find the number of days the platform are created since 1998. <br>*Note: Here 1998 is not the first time platform is built. 1998 here acts as a starting point of when the platform is built since it's the only data that is available. This means, the min() value of the New_Release_Date column is the benchmark of the first day the platform is built.* 

In [4]:
# NOT DONE

newDate = []

for index, row in df.iterrows():
        year = row['r-date'][7:9]
        if ( int(year) > 97 ):
            year = '19' + row['r-date'][7:9]
        else:
            year = '20' + row['r-date'][7:9]
        date = row['r-date'][0:2]
        month_name = row['r-date'][3:6]
        # month_name = finalDate[5:8]
        mnum = datetime.datetime.strptime(month_name, '%b').month
        finalDate = datetime.datetime(int(year), mnum, int(date))
        newDate.append(finalDate)
# for i in range(5):
    # print(newDate[i])
# # insert the newDate to the last position of dataframe
orvin_df.insert(loc=len(orvin_df.columns),column="New_Release_Date",value=newDate)
# # change the format to remove the timezone information and HH:MM:SS
orvin_df["New_Release_Date"] = orvin_df["New_Release_Date"].dt.date
print(orvin_df.head(n=5))

# For repeating and testing purposes
# orvin_df = orvin_df.drop(columns=["New_Release_Date"])

                                   name      platform     r-date  \
0  The Legend of Zelda: Ocarina of Time    Nintendo64  23-Nov-98   
1              Tony Hawk's Pro Skater 2   PlayStation  20-Sep-00   
2                   Grand Theft Auto IV  PlayStation3  29-Apr-08   
3                           SoulCalibur     Dreamcast  08-Sep-99   
4                   Grand Theft Auto IV       Xbox360  29-Apr-08   

  New_Release_Date  
0       1998-11-23  
1       2000-09-20  
2       2008-04-29  
3       1999-09-08  
4       2008-04-29  


In [6]:
first_platform_release_date = orvin_df["New_Release_Date"].min()
print(f'The first platform created based on the dataset is: {first_platform_release_date}')

The first platform created based on the dataset is: 1998-01-21


In [7]:
days_since_first = (orvin_df["New_Release_Date"] - first_platform_release_date).dt.days; 
orvin_df.insert(loc=len(orvin_df.columns),column="days_since",value=days_since_first)
print(orvin_df.head(n=5))

# only UNCOMMENT drop and comment the rest to redo the repeat doing the question 
# orvin_df = orvin_df.drop(columns=["days_since"])

                                   name      platform     r-date  \
0  The Legend of Zelda: Ocarina of Time    Nintendo64  23-Nov-98   
1              Tony Hawk's Pro Skater 2   PlayStation  20-Sep-00   
2                   Grand Theft Auto IV  PlayStation3  29-Apr-08   
3                           SoulCalibur     Dreamcast  08-Sep-99   
4                   Grand Theft Auto IV       Xbox360  29-Apr-08   

  New_Release_Date  
0       1998-11-23  
1       2000-09-20  
2       2008-04-29  
3       1999-09-08  
4       2008-04-29  
