# Heroes of Pymoli Data Analysis
---------------------------------------------------------------------------------------------------------------
<u>Notable trends</u>:
- The majority of our players are male (at 84%) and there is a significant number of players who are in the age group of 20-24 years of age (at 44.8%).
- The top five most popular items average approximately 4.42 dollars per unit.
- The top five most profitable items average approximately 4.59 dollars per unit.
- The top five spenders spend about 3.45 dollars per purchase.

---------------------------------------------------------------------------------------------------------------
<u>While conducting research on my data, I have come across several limitations (meaning there may be more) which are the following</u>:
- Although each row represents a single screen name's purchase information, we are unable to deduce with the present data whether an individual may purchase items on numerous accounts; therefore, we are left with the assumption that each individual user will only purchase items on one account.
- These records represent a list of purchases that were made for items within the game and relative user information; therefore, the data may not represent the total number of active users, but rather, it represents the total number of users who have made purchases, including duplicates (or instances of more than one purchase), which have been taken into account while performing the necessary calculations for the numbers shown underneath.
- The given CSV file does not state what dates these purchases were made.
---------------------------------------------------------------------------------------------------------------

In [1]:
# Import our dependencies and create the file path & dataframe
import pandas as pd
import numpy as np
purchase_file = 'purchase_data.csv'
og_df = pd.read_csv(purchase_file)


# Player Count

In [2]:
# Count the number of screen names to get the total number of players
number_of_players = len(og_df["SN"].value_counts())
# Put that value into a list and make it into a data frame
number_of_players_list = [number_of_players]
total_players_df = pd.DataFrame(number_of_players_list, columns=["Total Players"])

total_players_df


Unnamed: 0,Total Players
0,576


# Purchasing Analysis (Total)

In [3]:
# Count the total of each individual item
number_of_unique_items = og_df["Item ID"].value_counts().count()
# Find the average
avg_purchase_price = og_df["Price"].mean()
# Count the purchase IDs to derive total number of purchases
total_number_of_purchases = og_df["Purchase ID"].count()
# Take the sum of all of the prices per purchase to get the total revenue
total_rev = og_df["Price"].sum()

# Take all of the found data and put it into a list
purchase_analysis = [{
    "Number of Unique Items": number_of_unique_items,
    "Average Price": avg_purchase_price,
    "Total Number of Purchases": total_number_of_purchases,
    "Total Revenue": total_rev
}]

# Put the list into a data frame
df_purchase_analysis = pd.DataFrame(purchase_analysis)

# Rearrange the data frame and format the necessary values
df_p_a_rearranged = df_purchase_analysis[["Number of Unique Items", "Average Price", "Total Number of Purchases", "Total Revenue"]]
df_p_a_rearranged["Average Price"] = df_p_a_rearranged["Average Price"].map("${:,.2f}".format)
df_p_a_rearranged["Total Revenue"] = df_p_a_rearranged["Total Revenue"].map("${:,.2f}".format)

df_p_a_rearranged


Unnamed: 0,Number of Unique Items,Average Price,Total Number of Purchases,Total Revenue
0,183,$3.05,780,"$2,379.77"


# Gender Demographics

In [4]:
# Drop duplicates before calculating total number of users
dropped_duplicates_SN_df = og_df[['Purchase ID', 'SN', 'Age', 'Gender', 'Item ID', 'Item Name', 'Price']].drop_duplicates(subset="SN")


In [5]:
# Count the total number of men, women, and other/ND
gender_counts = [dropped_duplicates_SN_df["Gender"].value_counts()]
# Put the values into a data frame
gender_counts = pd.DataFrame(gender_counts)
# Search through the gender_counts data frame for the number of men
count_male = gender_counts.loc["Gender", "Male"]
# Determine the percentage
percentage_male = count_male/number_of_players*100
# Search through the gender_counts data frame for the number of women
count_female = gender_counts.loc["Gender", "Female"]
# Determine the percentage
percentage_female = count_female/number_of_players*100
# Search through the gender_counts data frame for the number of other/ND
count_other_ND = gender_counts.loc["Gender", "Other / Non-Disclosed"]
# Determine the percentage
percentage_other_ND = count_other_ND/number_of_players*100
# Create lists to hold the new found data and their key values
key_values_gender = ["Male", "Female", "Other / Non-Disclosed"]
total_count_gender = [count_male, count_female, count_other_ND]
total_percentage_gender = [percentage_male, percentage_female, percentage_other_ND]
# Create a data frame with the lists and format the necessary values
gender_demographics_df = pd.DataFrame(
   {'Gender': key_values_gender,
    'Total Count': total_count_gender,
    'Total Percentage': total_percentage_gender
   })
gender_demographics_df["Total Percentage"] = gender_demographics_df["Total Percentage"].map("{:.1f}%".format)

gender_demographics_df


Unnamed: 0,Gender,Total Count,Total Percentage
0,Male,484,84.0%
1,Female,81,14.1%
2,Other / Non-Disclosed,11,1.9%


*<u>Note</u>: <i>Above is the total number of players (counted by unique screen names) who have made purchases, as stated in the CSV.</i>

# Purchasing Analysis (Gender)

In [6]:
# Form groups by gender
gender_groupby = og_df.groupby(["Gender"])
# Restrict the count to any single column to return a series that can be used
# to perform the necessary arithmetic to return an avg purchase ttl per person by gender
gender_groupby_count = gender_groupby["Age"].count()
# Find total numbers of purchases per gender group
purchase_count = gender_groupby["Price"].count()
# Find average price of purchases per gender group
avg_purchase_price = gender_groupby["Price"].mean()
# Total revenue accrued through purchases per gender group
total_purchase_value_gender = gender_groupby["Price"].sum()
# Divide the total purchase value by the number of individuals per gender group
avg_purchase_total_per_person_by_gender = total_purchase_value_gender/dropped_duplicates_SN_df["Gender"].count()
# Create a data frame containing all the found data
purchase_analysis_df_gender = pd.DataFrame({
    "Purchase Count": purchase_count,
    "Average Purchase Price": avg_purchase_price, 
    "Total Purchase Value": total_purchase_value_gender,
    "Average Purchase Total per Person": avg_purchase_total_per_person_by_gender
})
# Format the necessary values
purchase_analysis_df_gender["Average Purchase Price"] = purchase_analysis_df_gender["Average Purchase Price"].map("${:,.2f}".format)
purchase_analysis_df_gender["Total Purchase Value"] = purchase_analysis_df_gender["Total Purchase Value"].map("${:,.2f}".format)
purchase_analysis_df_gender["Average Purchase Total per Person"] = purchase_analysis_df_gender["Average Purchase Total per Person"].map("${:,.2f}".format)
purchase_analysis_df_gender = purchase_analysis_df_gender[['Purchase Count', 'Average Purchase Price', 'Average Purchase Total per Person', 'Total Purchase Value']]

purchase_analysis_df_gender


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Average Purchase Total per Person,Total Purchase Value
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$3.20,$0.63,$361.94
Male,652,$3.02,$3.42,"$1,967.64"
Other / Non-Disclosed,15,$3.35,$0.09,$50.19


*<u>Note</u>: <i>
- Under 'Purchase Count' and 'Average Purchase Price', the total number represents the total number of individual purchases, disregarding the case of duplicates, or rather, multiple purchases made by individual accounts.
- Under 'Average Purchase Total per Person', the sum of all purchases grouped by gender is divided over the total number of unique screen names; in other words, per the number of unique users who have made purchases.
- 'Total Purchase Value' is the sum of all purchases (including numerous purchases from individual accounts) grouped by gender.</i>

# Age Demographics

In [7]:
age_bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999]
group_names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

df_age = pd.DataFrame(pd.cut(dropped_duplicates_SN_df["Age"], age_bins, labels=group_names).value_counts().sort_index(ascending=True))
df_age = df_age.rename(columns={"Age": "Number of Users per Age Group"})
df_age["Percentage of Players"] = df_age["Number of Users per Age Group"]/number_of_players*100
df_age["Percentage of Players"] = df_age["Percentage of Players"].map("{:.1f}%".format)

df_age


Unnamed: 0,Number of Users per Age Group,Percentage of Players
<10,17,3.0%
10-14,22,3.8%
15-19,107,18.6%
20-24,258,44.8%
25-29,77,13.4%
30-34,52,9.0%
35-39,31,5.4%
40+,12,2.1%


# Purchasing Analysis (Age)

In [8]:
age_bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999]
group_names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]
age_range_per_user = pd.cut(og_df["Age"], age_bins, labels=group_names)
# Purchase Count
purchase_count_series_age = age_range_per_user.value_counts().sort_index(ascending=True)
# Average Purchase Price
og_df["Age Bins"] = pd.cut(og_df["Age"], age_bins, labels=group_names)
age_bin_groupby = og_df.groupby(["Age Bins"])
average_purchase_price_age = age_bin_groupby["Price"].mean()
# Total Purchase Value
total_purchase_value_age = age_bin_groupby["Price"].sum()
# Average Purchase Total per Person by Age Group
avg_purchase_total_per_person_by_age = total_purchase_value_age/dropped_duplicates_SN_df["Age"].count()
# Put all of the found data into a data frame
purchase_analysis_df_age = pd.DataFrame({
    "Purchase Count": purchase_count_series_age,
    "Average Purchase Price": average_purchase_price_age,
    "Total Purchase Value": total_purchase_value_age,
    "Average Purchase Total per Person": avg_purchase_total_per_person_by_age
})
# Format the appropriate columns accordingly
purchase_analysis_df_age["Average Purchase Price"] = purchase_analysis_df_age["Average Purchase Price"].map("${:,.2f}".format)
purchase_analysis_df_age["Total Purchase Value"] = purchase_analysis_df_age["Total Purchase Value"].map("${:,.2f}".format)
purchase_analysis_df_age["Average Purchase Total per Person"] = purchase_analysis_df_age["Average Purchase Total per Person"].map("${:,.2f}".format)
purchase_analysis_df_age = purchase_analysis_df_age[['Purchase Count', 'Average Purchase Price', 'Average Purchase Total per Person', 'Total Purchase Value']]

purchase_analysis_df_age


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Average Purchase Total per Person,Total Purchase Value
Age Bins,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,23,$3.35,$0.13,$77.13
10-14,28,$2.96,$0.14,$82.78
15-19,136,$3.04,$0.72,$412.89
20-24,365,$3.05,$1.93,"$1,114.06"
25-29,101,$2.90,$0.51,$293.00
30-34,73,$2.93,$0.37,$214.00
35-39,41,$3.60,$0.26,$147.67
40+,13,$2.94,$0.07,$38.24


*<u>Note</u>: <i>
- Under 'Purchase Count' and 'Average Purchase Price', the total number represents the total number of individual purchases, disregarding the case of duplicates, or rather, multiple purchases made by individual accounts.
- Under 'Average Purchase Total per Person', the sum of all purchases grouped by gender is divided over the total number of unique screen names; in other words, per the number of unique users who have made purchases.
- 'Total Purchase Value' is the sum of all purchases (including numerous purchases from individual accounts) grouped by age.</i>

# Top Spenders

In [9]:
# Find the top spenders
top_spenders = og_df["SN"].value_counts()
# Group by screen names to find the sum of the purchases made per account and divide by the number of each purchase
sn_groupby = og_df.groupby(["SN"])
top_spenders_avg_purchase_price = sn_groupby["Price"].sum()/og_df["SN"].value_counts()
# Find the sums of all purchases per screen name
top_spenders_total_purchase_value = sn_groupby["Price"].sum()
# Put the found data into a data frame
top_spenders_df = pd.DataFrame({
    "Purchase Count": top_spenders,
    "Average Purchase Price": top_spenders_avg_purchase_price,
    "Total Purchase Value": top_spenders_total_purchase_value
})
# Format the data accordingly
top_spenders_df = top_spenders_df.sort_values(by=["Purchase Count"], ascending=False).head()
top_spenders_df["Average Purchase Price"] = top_spenders_df["Average Purchase Price"].map("${:,.2f}".format)
top_spenders_df["Total Purchase Value"] = top_spenders_df["Total Purchase Value"].map("${:,.2f}".format)

top_spenders_df


Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Value
Lisosia93,5,$3.79,$18.96
Iral74,4,$3.40,$13.62
Idastidru52,4,$3.86,$15.45
Asur53,3,$2.48,$7.44
Inguron55,3,$3.70,$11.11


# Most Popular Items

In [10]:
# Limit data frame to necessary columns
item_df = og_df[["Item ID", "Item Name", "Price"]]
# Group by necessary columns
item_groupby = item_df.groupby(["Item ID", "Item Name"])
# .count() needs duplicates in order to count number of purchases
item_purchase_count = item_groupby.count()
# Rename column to 'Purchase Count'
item_purchase_count = item_purchase_count.rename(columns={"Price": "Purchase Count"})
# Drop duplicates of items and search by their prices
item_price = og_df[["Item ID", "Item Name", "Price"]].drop_duplicates(subset="Item Name").sort_values(by="Item Name")
item_price = item_price.reset_index().drop(columns="index")
# Merge data frames on "Item Name"
popular_items_df_prep = pd.merge(item_purchase_count, item_price, on=["Item ID", "Item Name"], how="outer")
# List data frame by highest number of purchases and prices
popular_items_df_prep = popular_items_df_prep.sort_values(by=["Purchase Count", "Price"], ascending=False)
# Find the total purchase value of each item by finding the sum of all purchases made on each individual item
total_purchase_value = item_groupby["Price"].sum()
# Drop the duplicates of each item
total_purchase_value_df = pd.DataFrame(total_purchase_value.drop_duplicates())
# Rename the column to "Total Purchase Value"
total_purchase_value_df = total_purchase_value_df.rename(columns={"Price": "Total Purchase Value"})
# Merge the data frames on "Item Name"
popular_items_df_b4_format = pd.merge(popular_items_df_prep, total_purchase_value_df, on="Item Name", how="outer")
# Rename column appropriately
popular_items_df_b4_format = popular_items_df_b4_format.rename(columns={"Price": "Item Price"})
# Make a copy of the data frame and store it in another variable (as many as necessary)
popular_items_df_b4_format_2 = popular_items_df_b4_format.copy()
popular_items_df_after_format_1 = popular_items_df_b4_format.copy()
# Format accordingly
popular_items_df_after_format_1["Item Price"] = popular_items_df_after_format_1["Item Price"].map("${:,.2f}".format)
popular_items_df_after_format_1["Total Purchase Value"] = popular_items_df_after_format_1["Total Purchase Value"].map("${:,.2f}".format)

popular_items_df_after_format_1.head()


Unnamed: 0,Item ID,Item Name,Purchase Count,Item Price,Total Purchase Value
0,178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
1,82,Nirvana,9,$4.90,$44.10
2,145,Fiery Glass Crusader,9,$4.58,$41.22
3,108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77
4,92,Final Critic,8,$4.88,$39.04


# Most Profitable Items

In [11]:
# Grab the copy of the df before formatting and reformat it accordingly to show the appropriate information
popular_items_df_b4_format_2 = popular_items_df_b4_format_2.sort_values(by="Total Purchase Value", ascending=False).drop_duplicates(subset="Item Name")
popular_items_df_b4_format_2 = popular_items_df_b4_format_2[['Item ID', 'Item Name', 'Total Purchase Value', 'Item Price', 'Purchase Count']]
popular_items_df_b4_format_2["Item Price"] = popular_items_df_b4_format_2["Item Price"].map("${:,.2f}".format)
popular_items_df_b4_format_2["Total Purchase Value"] = popular_items_df_b4_format_2["Total Purchase Value"].map("${:,.2f}".format)

popular_items_df_b4_format_2.head()


Unnamed: 0,Item ID,Item Name,Total Purchase Value,Item Price,Purchase Count
0,178,"Oathbreaker, Last Hope of the Breaking Storm",$50.76,$4.23,12
1,82,Nirvana,$44.10,$4.90,9
2,145,Fiery Glass Crusader,$41.22,$4.58,9
4,92,Final Critic,$39.04,$4.88,8
8,103,Singed Scalpel,$34.80,$4.35,8
