# Heroes of Pymoli
![Fantasy](Images/Fantasy.png)

## Import Dependencies

In [1]:
%matplotlib notebook
# Dependencies and Setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_df = pd.read_csv(file_to_load)
purchase_df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count
* Total number of players

In [3]:
# slice the data to get a table of unique players only (remove transaction data)
players_df = purchase_df.loc[:, ["SN", "Age", "Gender"]]

# remove duplicate player IDs 
players_df = players_df.drop_duplicates(subset ="SN", 
                     keep = 'first', inplace = False)

num_players = players_df["SN"].count()
print(f'There are {num_players} players.')

There are 576 players.


## Purchasing Analysis
* Number of Unique Items
* Average Purchase Price
* Total Number of Purchases
* Total Revenue

In [4]:
unique_items = np.unique(purchase_df["Item ID"])
items_num = len(unique_items)
price_avg = np.average(purchase_df["Price"])
purchases_count = purchase_df["Purchase ID"].count()
revenue = np.sum(purchase_df["Price"])

In [5]:
results = {'Unique Items': [items_num], 'Average Price': [f'$ {round(price_avg, 2)}'], 
          'Number of Purchases': [purchases_count], 'Total Revenue': [f'$ {revenue}']}
results_df = pd.DataFrame(data=results)

print(f'----------------- Purchasing Analysis ------------------')
print(f'There are {items_num} unique items.')
print(f'They were purchased at an average price of $ {round(price_avg, 2)}.')
print(f'{num_players} players made {purchases_count} purchases.')
print(f'--------------------------------------------------------')
print(f'Total Revenue: $ {revenue}')
results_df

----------------- Purchasing Analysis ------------------
There are 183 unique items.
They were purchased at an average price of $ 3.05.
576 players made 780 purchases.
--------------------------------------------------------
Total Revenue: $ 2379.77


Unnamed: 0,Unique Items,Average Price,Number of Purchases,Total Revenue
0,183,$ 3.05,780,$ 2379.77


### Gender Demographics
* Percentage and Count of Male Players
* Percentage and Count of Female Players
* Percentage and Count of Other / Non-Disclosed

In [6]:
# use data frame of unique players to create gender data
players_df.set_index("Gender")

male_df = players_df.loc[players_df["Gender"] == "Male"]
female_df = players_df.loc[players_df["Gender"] == "Female"]

male_count = male_df["Gender"].count()
female_count = female_df["Gender"].count()
other_count = num_players-female_count-male_count

# use groupby to make a cool dataframe
grouped_by_gender_df = players_df.groupby("Gender")
gender_count_df = grouped_by_gender_df.count()

# make the percentage column
gender_percentage = [f'{round(val/num_players*100, 2)} %' for val in gender_count_df["SN"]]

# assemble the final dataframe
gender_demo_df = pd.DataFrame(data={"Total Count": gender_count_df["SN"], "Percentage of Players": gender_percentage})
gender_demo_df

Unnamed: 0_level_0,Total Count,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,81,14.06 %
Male,484,84.03 %
Other / Non-Disclosed,11,1.91 %


In [7]:
# Print output 
print(f'----------------- Gender Demographics ------------------')
print(f'Percentage and Count of Male Players: % {male_count/num_players*100} ({male_count})')
print(f'Percentage and Count of Male Players: % {female_count/num_players*100} ({female_count})')
print(f'Percentage and Count of Other Gendered Players: % {other_count/num_players*100} ({other_count})')
print(f'--------------------------------------------------------')

----------------- Gender Demographics ------------------
Percentage and Count of Male Players: % 84.02777777777779 (484)
Percentage and Count of Male Players: % 14.0625 (81)
Percentage and Count of Other Gendered Players: % 1.9097222222222223 (11)
--------------------------------------------------------


### Purchasing Analysis (Gender)

* The below each broken by gender
  * Purchase Count
  * Average Purchase Price
  * Total Purchase Value
  * Average Purchase Total per Person by Gender

In [8]:
# get the average purchase total per person
groupby_SN_purchase_df = purchase_df.groupby("SN")

purchase_count_perSN = groupby_SN_purchase_df["Price"].count()
purchase_tot_perSN = groupby_SN_purchase_df["Price"].sum()
avg_purchase_tot_perSN = groupby_SN_purchase_df["Price"].sum()/groupby_SN_purchase_df["Price"].count()

# make a new data frame with these results
purchase_count_perSN_df = pd.DataFrame(data=purchase_count_perSN)                                          
purchase_count_perSN_df.columns = ['Number Of Purchases']

purchase_tot_perSN_df = pd.DataFrame(data=purchase_tot_perSN)                                          
purchase_tot_perSN_df.columns = ['Total Purchase Value']

avg_purchase_tot_perSN_df = pd.DataFrame(data=avg_purchase_tot_perSN)                                          
avg_purchase_tot_perSN_df.columns = ['Avg Purchase Value per Person']

# merge the new data frames with the player data
players_df.set_index("SN")
players_df = pd.merge(players_df, pd.merge(purchase_count_perSN_df, pd.merge(purchase_tot_perSN_df, avg_purchase_tot_perSN_df, on="SN"), on="SN"), on="SN")

# groupby purchases on gender and get values
grouped_purchase_df = purchase_df.groupby("Gender")
purch_count = grouped_purchase_df["Purchase ID"].count()
avg_purchase = grouped_purchase_df["Price"].sum()/purch_count
revenue = grouped_purchase_df["Price"].sum()

# groupby players with average purchase data on gender and get values
groupby_gender_players_df = players_df.groupby("Gender")
avg_total_purchase = groupby_gender_players_df["Avg Purchase Value per Person"].sum()/groupby_gender_players_df["Avg Purchase Value per Person"].count()


In [9]:
results_df = pd.DataFrame(data={'Purchase Count': purch_count, 'Average Purchase Price':avg_purchase,
                               'Total Purchase Value': revenue, 'Avg Total Purchase per Person':avg_total_purchase})
results_df.head()


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,3.203009,361.94,3.194835
Male,652,3.017853,1967.64,3.014269
Other / Non-Disclosed,15,3.346,50.19,3.348636


### Top Spenders

* Identify the the top 5 spenders in the game by total purchase value, then list (in a table):
  * SN
  * Purchase Count
  * Average Purchase Price
  * Total Purchase Value

In [10]:
players_df.nlargest(5,'Total Purchase Value')

Unnamed: 0,SN,Age,Gender,Number Of Purchases,Total Purchase Value,Avg Purchase Value per Person
72,Lisosia93,25,Male,5,18.96,3.792
253,Idastidru52,24,Male,4,15.45,3.8625
201,Chamjask73,22,Female,3,13.83,4.61
120,Iral74,21,Male,4,13.62,3.405
134,Iskadarya95,20,Male,3,13.1,4.366667


### Most Popular Items

* Identify the 5 most popular items by purchase count, then list (in a table):
  * Item ID
  * Item Name
  * Purchase Count
  * Item Price
  * Total Purchase Value

In [11]:
# make a data frame of only item data
item_data_df = purchase_df[["Item ID", "Price", "Item Name"]] 

In [12]:
# group by item ID, count purchases, make a data frame and add to the item data df
purchases_groupbyitemID = item_data_df.groupby('Item ID')
purchases_itemID_count = purchases_groupbyitemID['Item ID'].count()
purchases_itemID_count_df = pd.DataFrame(data=purchases_itemID_count)

# rename column
purchases_itemID_count_df.columns = ['Number of Purchases']

# merge with item data
item_data_df = pd.merge(item_data_df, purchases_itemID_count_df, on='Item ID')

# add sum of purchases by item ID
itemID_revenue = purchases_groupbyitemID['Price'].sum()
itemID_revenue_df = pd.DataFrame(data=itemID_revenue)
itemID_revenue_df.columns = ['Revenue']

# merge with item data
item_data_df = pd.merge(item_data_df, itemID_revenue_df, on='Item ID')

# make a unique item list for summary data
item_data_df = item_data_df.drop_duplicates(subset ="Item ID", 
                     keep = 'first', inplace = False)

# list most popular by purchase count
item_data_df.nlargest(5,'Number of Purchases')

Unnamed: 0,Item ID,Price,Item Name,Number of Purchases,Revenue
120,178,4.23,"Oathbreaker, Last Hope of the Breaking Storm",12,50.76
0,108,3.53,"Extraction, Quickblade Of Trembling Hands",9,31.77
87,82,4.9,Nirvana,9,44.1
469,145,4.58,Fiery Glass Crusader,9,41.22
15,92,4.88,Final Critic,8,39.04


### Most Profitable Items

* Identify the 5 most profitable items by total purchase value, then list (in a table):
  * Item ID
  * Item Name
  * Purchase Count
  * Item Price
  * Total Purchase Value

In [14]:
# list most popular by purchase count
item_data_df.nlargest(5,'Revenue')

Unnamed: 0,Item ID,Price,Item Name,Number of Purchases,Revenue
120,178,4.23,"Oathbreaker, Last Hope of the Breaking Storm",12,50.76
87,82,4.9,Nirvana,9,44.1
469,145,4.58,Fiery Glass Crusader,9,41.22
15,92,4.88,Final Critic,8,39.04
547,103,4.35,Singed Scalpel,8,34.8


### Three Observable Trends

1) Men are the primary audiance for the game heroes of Pymoli making up 84% of the players

2) Most players that purchase an item only purchase 1 item, with an average purchase value of aroun $3

3) Items with a long name appear to be most popular, though it would take more investigating to determine with certainty.

