### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = r"purchase_data.csv"
# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

purchase_data.head(10)

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44
5,5,Yalae81,22,Male,81,Dreamkiss,3.61
6,6,Itheria73,36,Male,169,"Interrogator, Blood Blade of the Queen",2.18
7,7,Iskjaskst81,20,Male,162,Abyssal Shard,2.67
8,8,Undjask33,22,Male,21,Souleater,1.1
9,9,Chanosian48,35,Other / Non-Disclosed,136,Ghastly Adamantite Protector,3.58


## Player Count

* Display the total number of players


In [2]:
player_count = len(purchase_data["SN"].unique())
player_count

576

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
#total number of unique items
item_count = len(purchase_data["Item Name"].unique())
item_count

179

In [4]:
#average price of items
avg_price = round(purchase_data['Price'].mean(),2)
avg_price

3.05

In [5]:
#total number of purchases
purchase_count = len(purchase_data['Purchase ID'])
purchase_count

780

In [6]:
#total revenue
total_rev = round(purchase_data['Price'].sum(),2)
total_rev

2379.77

In [7]:
purchasing_df = pd.DataFrame({"Total Players": [player_count],
             "Item Count": [item_count],
             "Average Price": [avg_price],
             "Total Purchases": [purchase_count],
             "Total Revenues": [total_rev],})
purchasing_df

Unnamed: 0,Total Players,Item Count,Average Price,Total Purchases,Total Revenues
0,576,179,3.05,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [8]:
gender_sort = purchase_data.groupby("Gender")

gender_sort.count()

Unnamed: 0_level_0,Purchase ID,SN,Age,Item ID,Item Name,Price
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Female,113,113,113,113,113,113
Male,652,652,652,652,652,652
Other / Non-Disclosed,15,15,15,15,15,15


In [9]:
gender_count = gender_sort.nunique()["SN"]
gender_count

Gender
Female                    81
Male                     484
Other / Non-Disclosed     11
Name: SN, dtype: int64

In [10]:
gender_percent = gender_count.apply(lambda x: (x/player_count)*100)
gender_percent 

Gender
Female                   14.062500
Male                     84.027778
Other / Non-Disclosed     1.909722
Name: SN, dtype: float64

In [11]:
pd.concat([gender_count,gender_percent],axis = 1)

Unnamed: 0_level_0,SN,SN
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,81,14.0625
Male,484,84.027778
Other / Non-Disclosed,11,1.909722



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [12]:
purch_ct_gen = pd.DataFrame(gender_sort['Purchase ID'].count())
purch_ct_gen

Unnamed: 0_level_0,Purchase ID
Gender,Unnamed: 1_level_1
Female,113
Male,652
Other / Non-Disclosed,15


In [13]:
avg_price_gen = round(pd.DataFrame(gender_sort['Price'].mean()),2)
avg_price_gen

Unnamed: 0_level_0,Price
Gender,Unnamed: 1_level_1
Female,3.2
Male,3.02
Other / Non-Disclosed,3.35


In [14]:
#average purchase total per person
sum_price = pd.DataFrame(gender_sort['Price'].sum())
sum_price

Unnamed: 0_level_0,Price
Gender,Unnamed: 1_level_1
Female,361.94
Male,1967.64
Other / Non-Disclosed,50.19


sum_price/gender_count

In [15]:
#create summary table + format
gender_df = pd.merge(pd.merge(purch_ct_gen,avg_price_gen,on = "Gender"),sum_price,on="Gender",
                     suffixes = ["_Average","_Average Total"])
gender_df

Unnamed: 0_level_0,Purchase ID,Price_Average,Price_Average Total
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,113,3.2,361.94
Male,652,3.02,1967.64
Other / Non-Disclosed,15,3.35,50.19


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [16]:
# purchase_data.Age.max()
# purchase_data.Age.min()
# #bins should be from 5 to 50 

In [17]:
bins = [5,14,23,32,41,50]
labels = ['5-14','15-23','24-32','33-41','42-50']
age_int = pd.cut(purchase_data.Age,bins, labels = labels)
age_int

0      15-23
1      33-41
2      24-32
3      24-32
4      15-23
       ...  
775    15-23
776    15-23
777    15-23
778     5-14
779    24-32
Name: Age, Length: 780, dtype: category
Categories (5, object): [5-14 < 15-23 < 24-32 < 33-41 < 42-50]

In [18]:
bins_ct = age_int.value_counts()
bins_ct

15-23    434
24-32    218
33-41     72
5-14      51
42-50      5
Name: Age, dtype: int64

In [19]:
bins_pct = round(bins_ct.apply(lambda x: (x)/player_count)*100,2)
bins_pct

15-23    75.35
24-32    37.85
33-41    12.50
5-14      8.85
42-50     0.87
Name: Age, dtype: float64

In [20]:
#create summary table
# age_summary = pd.concat([bins_ct,bins_pct],axis=1,sort=True)

## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [21]:
age_sort = purchase_data.assign(Ages = age_int)
age_sort.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price,Ages
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53,15-23
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56,33-41
2,2,Ithergue48,24,Male,92,Final Critic,4.88,24-32
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27,24-32
4,4,Iskosia90,23,Male,131,Fury,1.44,15-23


In [22]:
purch_ct_age = age_sort.groupby('Ages')['Purchase ID'].count()
purch_ct_age

Ages
5-14      51
15-23    434
24-32    218
33-41     72
42-50      5
Name: Purchase ID, dtype: int64

In [23]:
avg_price_age = round(age_sort.groupby('Ages')['Price'].mean(),2)
avg_price_age

Ages
5-14     3.14
15-23    3.03
24-32    3.03
33-41    3.16
42-50    3.00
Name: Price, dtype: float64

In [24]:
sum_price_age = age_sort.groupby('Ages')['Price'].sum()
sum_price_age

Ages
5-14      159.91
15-23    1316.73
24-32     660.28
33-41     227.86
42-50      14.99
Name: Price, dtype: float64

In [25]:
pd.concat([purch_ct_age,avg_price_age,sum_price_age], axis = 1)

Unnamed: 0_level_0,Purchase ID,Price,Price
Ages,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
5-14,51,3.14,159.91
15-23,434,3.03,1316.73
24-32,218,3.03,660.28
33-41,72,3.16,227.86
42-50,5,3.0,14.99


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [26]:
temp = purchase_data[['SN','Price']].groupby('SN').sum().sort_values('Price', ascending = False).head(5)
mask = list(temp.index)

In [27]:
purchase_data_new = purchase_data.set_index('SN')
purchase_data_new

spender_data = purchase_data_new.loc[mask]
spender_data

Unnamed: 0_level_0,Purchase ID,Age,Gender,Item ID,Item Name,Price
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Lisosia93,74,25,Male,89,"Blazefury, Protector of Delusions",4.64
Lisosia93,120,25,Male,24,Warped Fetish,3.81
Lisosia93,224,25,Male,157,"Spada, Etcher of Hatred",4.8
Lisosia93,603,25,Male,132,Persuasion,3.19
Lisosia93,609,25,Male,40,Second Chance,2.52
Idastidru52,290,24,Male,147,"Hellreaver, Heirloom of Inception",4.93
Idastidru52,490,24,Male,148,"Warmonger, Gift of Suffering's End",4.03
Idastidru52,543,24,Male,121,Massacre,1.6
Idastidru52,676,24,Male,111,Misery's End,4.89
Chamjask73,222,22,Female,178,"Oathbreaker, Last Hope of the Breaking Storm",4.23


In [28]:
purch_ct_spend = spender_data.groupby('SN')['Purchase ID'].count()
purch_ct_spend

SN
Chamjask73     3
Idastidru52    4
Iral74         4
Iskadarya95    3
Lisosia93      5
Name: Purchase ID, dtype: int64

In [29]:
avg_price_spend = round(spender_data.groupby('SN')['Price'].mean(),2)
avg_price_spend

SN
Chamjask73     4.61
Idastidru52    3.86
Iral74         3.40
Iskadarya95    4.37
Lisosia93      3.79
Name: Price, dtype: float64

In [30]:
purch_sum_spend = spender_data.groupby('SN')['Price'].sum()
purch_sum_spend

SN
Chamjask73     13.83
Idastidru52    15.45
Iral74         13.62
Iskadarya95    13.10
Lisosia93      18.96
Name: Price, dtype: float64

In [31]:
#summary table 

## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [32]:
temp2 = purchase_data[['Item ID','Item Name','Price']].groupby('Item Name').sum().sort_values('Price', ascending = False).head(5)
mask2 = list(temp2.index)
item_data = purchase_data.set_index('Item Name').loc[mask2]
item_data

Unnamed: 0_level_0,Purchase ID,SN,Age,Gender,Item ID,Price
Item Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Final Critic,2,Ithergue48,24,Male,92,4.88
Final Critic,99,Haisrisuir60,23,Male,92,4.19
Final Critic,252,Tyaelo67,39,Male,92,4.88
Final Critic,273,Phyali88,15,Female,92,4.88
Final Critic,277,Ennalmol65,24,Male,92,4.88
Final Critic,411,Lisico81,10,Male,92,4.19
Final Critic,536,Siallylis44,20,Male,92,4.19
Final Critic,712,Lisilsa62,25,Male,92,4.88
Final Critic,722,Ilarin91,22,Male,92,4.88
Final Critic,767,Ilmol66,8,Female,92,4.88


In [33]:
purch_ct_item = item_data.groupby('Item Name')['Purchase ID'].count()
purch_ct_item

Item Name
Fiery Glass Crusader                             9
Final Critic                                    13
Nirvana                                          9
Oathbreaker, Last Hope of the Breaking Storm    12
Singed Scalpel                                   8
Name: Purchase ID, dtype: int64

In [34]:
avg_price_item = round(item_data.groupby('Item Name')['Price'].mean(),2)
avg_price_item
#Took the average price for items instead of pulling price because there are two different prices for Final Critic

Item Name
Fiery Glass Crusader                            4.58
Final Critic                                    4.61
Nirvana                                         4.90
Oathbreaker, Last Hope of the Breaking Storm    4.23
Singed Scalpel                                  4.35
Name: Price, dtype: float64

In [35]:
purch_sum_item = item_data.groupby('Item Name')['Price'].sum()
purch_sum_item

Item Name
Fiery Glass Crusader                            41.22
Final Critic                                    59.99
Nirvana                                         44.10
Oathbreaker, Last Hope of the Breaking Storm    50.76
Singed Scalpel                                  34.80
Name: Price, dtype: float64

In [36]:
#summary table 

## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

