### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = r"purchase_data.csv"
# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

purchase_data.head(10)

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44
5,5,Yalae81,22,Male,81,Dreamkiss,3.61
6,6,Itheria73,36,Male,169,"Interrogator, Blood Blade of the Queen",2.18
7,7,Iskjaskst81,20,Male,162,Abyssal Shard,2.67
8,8,Undjask33,22,Male,21,Souleater,1.1
9,9,Chanosian48,35,Other / Non-Disclosed,136,Ghastly Adamantite Protector,3.58


## Player Count

* Display the total number of players


In [2]:
player_count = len(purchase_data["SN"].unique())
player_count

576

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
#total number of unique items
item_count = len(purchase_data["Item Name"].unique())
item_count

179

In [4]:
#average price of items
avg_price = round(purchase_data['Price'].mean(),2)
avg_price

3.05

In [5]:
#total number of purchases
purchase_count = len(purchase_data['Purchase ID'])
purchase_count

780

In [6]:
#total revenue
total_rev = round(purchase_data['Price'].sum(),2)
total_rev

2379.77

In [7]:
summary_df = pd.DataFrame({"Total Players": [player_count],
             "Item Count": [item_count],
             "Average Price": [avg_price],
             "Total Purchases": [purchase_count],
             "Total Revenues": [total_rev],})
summary_df

Unnamed: 0,Total Players,Item Count,Average Price,Total Purchases,Total Revenues
0,576,179,3.05,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [8]:
gender_sort = purchase_data.groupby("Gender")

gender_sort.count()

Unnamed: 0_level_0,Purchase ID,SN,Age,Item ID,Item Name,Price
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Female,113,113,113,113,113,113
Male,652,652,652,652,652,652
Other / Non-Disclosed,15,15,15,15,15,15


In [9]:
gender_count = gender_sort.nunique()
gender_count

Unnamed: 0_level_0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Female,113,81,22,1,90,90,79
Male,652,484,39,1,178,178,144
Other / Non-Disclosed,15,11,8,1,13,13,12


In [10]:
gender_percent = pd.DataFrame(gender_count.apply(lambda x: (x/player_count)*100))
gender_percent 

Unnamed: 0_level_0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Female,19.618056,14.0625,3.819444,0.173611,15.625,15.625,13.715278
Male,113.194444,84.027778,6.770833,0.173611,30.902778,30.902778,25.0
Other / Non-Disclosed,2.604167,1.909722,1.388889,0.173611,2.256944,2.256944,2.083333



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [11]:
purch_ct_gen = pd.DataFrame(gender_sort['Purchase ID'].count())
purch_ct_gen

Unnamed: 0_level_0,Purchase ID
Gender,Unnamed: 1_level_1
Female,113
Male,652
Other / Non-Disclosed,15


In [12]:
avg_price_gen = round(pd.DataFrame(gender_sort['Price'].mean()),2)
avg_price_gen

Unnamed: 0_level_0,Price
Gender,Unnamed: 1_level_1
Female,3.2
Male,3.02
Other / Non-Disclosed,3.35


In [13]:
#average purchase total per person
sum_price = pd.DataFrame(gender_sort['Price'].sum())
sum_price

Unnamed: 0_level_0,Price
Gender,Unnamed: 1_level_1
Female,361.94
Male,1967.64
Other / Non-Disclosed,50.19


sum_price/gender_count

In [17]:
#create summary table + format


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [24]:
purchase_data.Age.max()
purchase_data.Age.min()
#bins should be from 5 to 50 

7

In [33]:
bins = [5,14,23,32,41,50]
labels = ['5-14','15-23','24-32','33-41','42-50']
age_int = pd.cut(purchase_data.Age,bins, labels = labels)
age_int

0      15-23
1      33-41
2      24-32
3      24-32
4      15-23
       ...  
775    15-23
776    15-23
777    15-23
778     5-14
779    24-32
Name: Age, Length: 780, dtype: category
Categories (5, object): [5-14 < 15-23 < 24-32 < 33-41 < 42-50]

In [41]:
bins_ct = age_int.value_counts()
bins_ct

15-23    434
24-32    218
33-41     72
5-14      51
42-50      5
Name: Age, dtype: int64

In [43]:
bins_pct = bins_ct.apply(lambda x: (x)/player_count)*100
bins_pct

15-23    75.347222
24-32    37.847222
33-41    12.500000
5-14      8.854167
42-50     0.868056
Name: Age, dtype: float64

In [None]:
#create summary table

## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [None]:
#put the previous values together into a table with the bins as an additional column

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

