# Heroes of Pymoli

### The data set analyzed here is in-game purchase data for a fictional online fantasy game called Heroes of Pymoli. The game has 1163 total active players, of which 576 (49.5%) are buyers who have made a total of 780 purchases. The data set describes each purchase identified by a unique purchase ID and details the user who made the purchase and the item bought. 

In [4]:
# Import dependencies and setup
import pandas as pd
import numpy as np

data_path = "purchase_data.csv"
data = pd.read_csv(data_path)

In [189]:
data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price,Age Bin
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53,20-24
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56,Over 30
2,2,Ithergue48,24,Male,92,Final Critic,4.88,20-24
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27,20-24
4,4,Iskosia90,23,Male,131,Fury,1.44,20-24


In [197]:
# Calculate total number of buyers and purchases
total_purchases = len(data)
purchasing_players = len(data['SN'].unique())
purchasers_pct = purchasing_players / 1163
freetoplay_players = 1163 - purchasing_players
freetoplay_pct = freetoplay_players / 1163
print(f'Total Free-To-Play Players: {freetoplay_players} ({freetoplay_pct:.1%})')
print(f'Total Purchasing Players: {purchasing_players} ({purchasers_pct:.1%})')
print(f'Total Purchases: {total_purchases}')

Total Free-To-Play Players: 587 (50.5%)
Total Purchasing Players: 576 (49.5%)
Total Purchases: 780


In [13]:
data.columns

Index(['Purchase ID', 'SN', 'Age', 'Gender', 'Item ID', 'Item Name', 'Price'], dtype='object')

In [11]:
data.describe()

Unnamed: 0,Purchase ID,Age,Item ID,Price
count,780.0,780.0,780.0,780.0
mean,389.5,22.714103,92.114103,3.050987
std,225.310896,6.659444,52.775943,1.169549
min,0.0,7.0,0.0,1.0
25%,194.75,20.0,48.0,1.98
50%,389.5,22.0,93.0,3.15
75%,584.25,25.0,139.0,4.08
max,779.0,45.0,183.0,4.99


Demographics
-----------------
There have been a total of 780 purchases made by 49.53% of players (576). 49.5% of male players and 49.7% of female players made a purchase. Of players that made a purchase, 84% (484) were male and 14.1% (81) were female. Male buyers made 83.6% (652) of the total purchases and female buyers made 14.5% (113) of the total purchases. 44.8% (258) of the buyers were 20-24, and they made 46.8% (365) of the total purchases.

In [119]:
# Breakdown of total purchases by gender
genders_purchase = data['Gender'].value_counts()
male_purchases_pct = genders_purchase['Male'] / total_purchases
female_purchases_pct = genders_purchase['Female'] / total_purchases
print(f'Purchases made by men: {genders_purchase["Male"]} ({male_purchases_pct:.1%})')
print(f'Purchases made by women: {genders_purchase["Female"]} ({female_purchases_pct:.1%})')
data.groupby('Gender').count()

Purchases made by men: 652 (83.6%)
Purchases made by women: 113 (14.5%)


Unnamed: 0_level_0,Purchase ID,SN,Age,Item ID,Item Name,Price,Age Bin
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Female,113,113,113,113,113,113,113
Male,652,652,652,652,652,652,652
Other / Non-Disclosed,15,15,15,15,15,15,15


In [110]:
# Breakdown of total purchases by age
mean_age_purchases = data['Age'].mean()

age_labels = ['Under 15', '15-19', '20-24', '25-29', 'Over 30']
age_bins = [0, 14, 19, 24, 29, 99999]
data['Age Bin'] = pd.cut(data['Age'], bins = age_bins, labels = age_labels)
purchases_by_age_count = data.groupby('Age Bin').count()
purchases_by_age = pd.DataFrame(purchases_by_age_count['Purchase ID']).rename(columns={'Purchase ID': 'Purchases'})
purchases_by_age['Percent'] = purchases_by_age['Purchases'] / total_purchases
purchases_by_age.style.format({'Percent': '{:,.1%}'})

Unnamed: 0_level_0,Purchases,Percent
Age Bin,Unnamed: 1_level_1,Unnamed: 2_level_1
Under 15,51,6.5%
15-19,136,17.4%
20-24,365,46.8%
25-29,101,12.9%
Over 30,127,16.3%


In [176]:
# Dataframe of purchasing players
purchasing_players_df = data[['SN', 'Age', 'Gender']].drop_duplicates('SN')

# Gender breakdown of purchasing players
genders_purchasing_players = purchasing_players_df['Gender'].value_counts()
pct_men_purchasing = genders_purchasing_players['Male'] / total_men
pct_women_purchasing = genders_purchasing_players['Female'] / total_women
male_purchasers_pct = genders_purchasing_players['Male'] / players_purchasing * 100
female_purchasers_pct = genders_purchasing_players['Female'] / players_purchasing * 100
print(f'Percent of men making purchases: {pct_men_purchasing:.1%}')
print(f'Percent of purchasers that are men: {genders_purchasing_players["Male"]} ({male_purchasers_pct:.1f})%')
print(f'Percent of women making purchases: {pct_women_purchasing:.1%}')
print(f'Percent of purchasers that are female: {genders_purchasing_players["Female"]} ({female_purchasers_pct:.1f})%')
      


Percent of men making purchases: 49.5%
Percent of purchasers that are men: 484 (84.0)%
Percent of women making purchases: 49.7%
Percent of purchasers that are female: 81 (14.1)%


Male                     484
Female                    81
Other / Non-Disclosed     11
Name: Gender, dtype: int64

In [112]:
# Age breakdown of purchasing players
mean_age_purchasers = purchasing_players_df['Age'].mean()

purchasing_players_df['Age Bin'] = pd.cut(data['Age'], bins = age_bins, labels = age_labels)
purchasers_by_age_count = purchasing_players_df.groupby('Age Bin').count()
purchasers_by_age = pd.DataFrame(purchasers_by_age_count['SN']).rename(columns={'SN': 'Buyers'})
purchasers_by_age['Percent'] = purchasers_by_age['Buyers'] / players_purchasing
purchasers_by_age.style.format({'Percent': '{:,.1%}'})

Unnamed: 0_level_0,Buyers,Percent
Age Bin,Unnamed: 1_level_1,Unnamed: 2_level_1
Under 15,39,6.8%
15-19,107,18.6%
20-24,258,44.8%
25-29,77,13.4%
Over 30,95,16.5%


Purchasing Analysis
-------------------------
179 different items were purchased for a total of $2379.77 in revenue. There are 145 different prices. Items range in price from $1.00 to $4.99 with an average price of $3.04. 

Revenue from men totaled $1967.64 with an average purchase price of $3.02 and average spend of $4.07 per person. Revenue from women totaled $361.94 with an average purchase price of $3.20 and average spend of $4.47 per person.



In [186]:
total_items = len(data['Item Name'].unique())
print(f'Total items: {total_items:,.0f}')
data_groupby_itemname = data.groupby('Item Name')
item_df = data[['Item ID', 'Item Name', 'Price']].drop_duplicates('Item Name')
average_price = round(item_df['Price'].mean(), 2)
print(f'Average Item Price: ${average_price:,.2f}')
total_revenue = data['Price'].sum()
print(f'Total Revenue: ${total_revenue:,.2f}')
purchases_by_gender = data.groupby('Gender')
purchases_by_gender_summary = pd.DataFrame(purchases_by_gender['Purchase ID'].count())
purchases_by_gender_summary['Average Price'] = purchases_by_gender['Price'].mean()
purchases_by_gender_summary['Total Revenue'] = purchases_by_gender['Price'].sum()
purchases_by_user = data.groupby(['Gender', 'SN'])
purchases_by_gender_summary['Avg Revenue per Person'] = purchases_by_gender_summary['Total Revenue'] / purchasing_players_df['Gender'].value_counts()
purchases_by_gender_summary.style.format({'Average Price': '${:.2f}', 'Total Revenue': '${:.2f}', 'Avg Revenue per Person': '${:.2f}'})

Total items: 179
Average Item Price: $3.04
Total Revenue: $2,379.77


Unnamed: 0_level_0,Purchase ID,Average Price,Total Revenue,Avg Revenue per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$3.20,$361.94,$4.47
Male,652,$3.02,$1967.64,$4.07
Other / Non-Disclosed,15,$3.35,$50.19,$4.56
