# **Business Understanding**

**Contents**

* Business Understanding
* Data Understanding
* Prepare Data
* Modeling
* Evalutaion

# **1. Business Understanding**
The data contains information of various games released over last 40 years from 1984 across various platforms and genres from various publishers.
The analysis project answers the following questions:
1. Which Genre has become most popular - Completed
2. Which Platform has revieved most popularity - Completed
3. A Time-Series Analysis of the Global Sales Trend of last 40 years - Completed
4. Publishers with most Global Sales along with which country offers the maximum of these sales - Completed

# **2. Data Understanding**
In this chapter of the analysis, we will load in the data and then trim and clean the dataset for analysis.

In [None]:
#importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib
%matplotlib inline

In [None]:
#loading the data
game = pd.read_csv("../input/vid-gem/Video_Games_Sales_as_at_22_Dec_2016.csv", encoding= 'unicode_escape')

game.head()


In [None]:
#checking the column where the amount of data missing is maximum
missing_percent = game.isnull().mean()

missing_values = pd.DataFrame({'column_name': game.columns,
                                 'percent_missing_values': missing_percent})

missing_values.sort_values(by=['percent_missing_values'], ascending=False).head()

In [None]:
#dropping the 6 columns
game = game.drop(['Critic_Score', 'Critic_Count', 'User_Score', 'User_Count', 'Developer', 'Rating'], axis=1)

In [None]:
game.head()

# **3. Data Preparation**
In this chapter we discover that there are still columns and rows with some values missing so we go ahead to clean and trim further to make the dataset completely workable 

In [None]:
#checking which coulmns have NaN values across the dataset
game.isna().any()

Though here the "Publisher" column showcases 0 NaN values. The fact is that the real data set does have 54 missing values in the column. But since it is an important column for my analysis question, instead of dropping it, I researched and appended the information to my local dataset. Thus, eleminationg its missing values.

In [None]:
#checking count of NaN values in the column
count = game["Publisher"].isna().sum()

print(count)

As we can see "Year of Release" has over 269 missing values. Unfortunately these are alot of values to be researched in a day. Hence, These particluar rows were dropped.

In [None]:
##checking count of NaN values in the column
count = game["Year_of_Release"].isna().sum()

print(count)

In [None]:
#dropping the partcular rows with the NaN values with respect to the column
game = game[pd.notnull(game['Year_of_Release'])]

Similarly, for "Genre" the 2 rows with the missing values were dropped.

In [None]:
count = game["Genre"].isna().sum()

print(count)

In [None]:
game = game[pd.notnull(game['Name'])]

In [None]:
game.describe()

# **4. Modeling**
We then build different visualising models to analysse the data to answer the  questions.

In [None]:
#Defining a graph plotting function
def plot_func(x, y, x_title=None, y_title=None):
    plt.bar(x,y)
    plt.xlabel(x_title)
    plt.ylabel(y_title)

# **A) Which Platform has revieved most popularity - Completed**

In [None]:
sales_platform = game.groupby(by='Platform').count()['Global_Sales'].sort_values(ascending=False)

sales_platform = sales_platform.to_frame()

sales_platform.head(5)

sales_height =  sales_platform.index.tolist()

plot_func.figsize=(10,10)

plot_func(sales_height[:5], sales_platform.iloc[:5].Global_Sales.values.tolist(), x_title="Gaming Platforms", y_title="Global Sales")

sales_platform.head(5)

# **B) Which Genre has become most popular - Completed**

In [None]:
sales_genre = game.groupby(by='Genre').count()['Global_Sales'].sort_values(ascending=False)

sales_genre = sales_genre.to_frame()

sales_height =  sales_genre.index.tolist()

plot_func(sales_height[:5], sales_genre.iloc[:5].Global_Sales.values.tolist(), x_title="Gaming Genres", y_title="Global Sales")

sales_genre.head(5)

# **C) A Time-Series Analysis of the Global Sales Trend of last 40 years - Completed**

In [None]:
#game_year = game.sort_values('Year_of_Release')

sales_year = game.groupby(by='Year_of_Release')['Global_Sales'].sum()

sales_year = sales_year.to_frame()

sales_year.plot(figsize=(9,6), color='Red')

# **D) Publishers with most Global Sales along with which country offers the maximum of these sales - Completed**

In [None]:
sales_publisher = game.groupby(by='Publisher').count()['Global_Sales'].sort_values(ascending=False)

sales_publisher = sales_publisher.to_frame()

sales_publisher.iloc[:5].plot.barh(figsize=(5,5))

sales_publisher.head(5)

In [None]:
subset = game[["Publisher", "NA_Sales", "EU_Sales", "JP_Sales", "Other_Sales"]]

subset = subset[subset["Publisher"].isin(['Electronic Arts', 'Activision', 'Namco Bandai Games', 'Ubisoft', 'Konami Digital Entertainment'])]

subset = subset.groupby(by='Publisher').sum()

subset.plot.bar(figsize=(10,10),subplots=True,legend=False)

plt.ylabel('')

subset

# **5. Evaluation**
A) Most selling gaming Platform is Sony's PS2(2127 Million games sold), followed by Nintendo DS, Sony's PS3, Nintendo Wii and finally Microsoft XBOX 360

B) The most selling gaming genre is Action (3308 Million games sold), second is Sports then comes Miscellaneous, Role-Playing and finally Shooting.

C) The Time-Series Analysis showcases that till 1995 the video game sales were up and down. Then there was a steady rise in sales with fewer variations in the trend. Then in 2000, the sales picked up even more and the decade of the 2010s saw a sharp and huge rise in video games sales volume. From there, there was a sharp fall in revenue till 2015. From there the sales have been slow but picking up.

D)The publisher with the most games sold is Electronic Arts(1344 Million games sold), followed by Activision then comes Namco Bandai Games, Ubisoft and lastly, Konami Digital Entertainment.

Electronic Arts, Activision and Ubisoft all recieve their maximum sales from North America. With the exception of Namco Bandai Games and Konami Digital Entertainment owing their maximum sales from the country of Japan.


# **Conclusion**
There are many ways one can conduct further analytics on the gaming industry. Comparing different Sales Regions or even Experimenting to find various different attributes affecting the sales of the games. In fact, additional work can be done on the dataset itself regarding the missing values in the customer and critic scores for each game for more in-depth analysis.