### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
df = pd.read_csv(file)
df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [2]:
player_count = len(df["SN"].value_counts())
pd.DataFrame([player_count], columns = ["Total Players"])

Unnamed: 0,Total Players
0,576


In [3]:
unique_items = df["Item Name"].nunique()
average_price = df["Price"].mean()
total_purchases = df["Purchase ID"].count()
total_revenue = df["Price"].sum()
Data_Summary = pd.DataFrame
{ "number of unique items" : unique_items ,
  "Average Price" : average_price , 
  "Total Purchases": total_purchases ,
  "Total revenue" : total_revenue }

{'number of unique items': 179,
 'Average Price': 3.050987179487176,
 'Total Purchases': 780,
 'Total revenue': 2379.77}

## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [4]:
#group by count and gender, remove duplicates
gender_grouped = df[["SN", "Gender"]]
gender_grouped = gender_grouped.drop_duplicates()
counts = gender_grouped["Gender"].value_counts()

total_counts = [counts[0],counts[1],counts[2]]
percents_g = [round((counts[0]/player_count)*100,2),round((counts[1]/player_count)*100,2),round((counts[2]/player_count)*100,2)]

gender_demo_df = pd.DataFrame({
    "Percentage of Players": percents_g,
    "Total Count": total_counts
})
gender_demo_df.index = (["Male", "Female", "Other / Non-Disclosed"])
gender_demo_df

Unnamed: 0,Percentage of Players,Total Count
Male,84.03,484
Female,14.06,81
Other / Non-Disclosed,1.91,11



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [5]:
gender_grouped = df[["SN", "Gender","Price"]]
counts = gender_grouped["Gender"].value_counts()

purchase_counts = [counts[0],counts[1],counts[2]]

gender_grouped = gender_grouped.groupby("Gender")
total_spent = gender_grouped.sum()
total_spent

avg_purchase_price = [total_spent.iloc[1,0]/counts[0], total_spent.iloc[0,0]/counts[1], total_spent.iloc[2,0]/counts[2]]

total_purchase_value = [total_spent.iloc[1,0], total_spent.iloc[0,0], total_spent.iloc[2,0]]


# Creating DataFrame & setting index
purchase_analysis_df = pd.DataFrame({
    "Purchase Count": purchase_counts,
    "Average Purchase Price": avg_purchase_price,
    "Total Purchase Value": total_purchase_value,
    "Gender": ["Male", "Female", "Other / Non-Disclosed"]})
purchase_analysis_df = purchase_analysis_df.set_index("Gender")
purchase_analysis_df = purchase_analysis_df[["Purchase Count", "Average Purchase Price", "Total Purchase Value",]]
purchase_analysis_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Male,652,3.017853,1967.64
Female,113,3.203009,361.94
Other / Non-Disclosed,15,3.346,50.19


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [10]:

df2 = df[["SN","Age"]]
df2 = df2.drop_duplicates()

# Age count
age_10 = df2[df2["Age"] < 10].count()[0]
age_14 = df2[(df2["Age"] >= 10) & (df2["Age"] <= 14)].count()[0]
age_19 = df2[(df2["Age"] >= 15) & (df2["Age"] <= 19)].count()[0]
age_24 = df2[(df2["Age"] >= 20) & (df2["Age"] <= 24)].count()[0]
age_29 = df2[(df2["Age"] >= 25) & (df2["Age"] <= 29)].count()[0]
age_34 = df2[(df2["Age"] >= 30) & (df2["Age"] <= 34)].count()[0]
age_39 = df2[(df2["Age"] >= 35) & (df2["Age"] <= 39)].count()[0]
age_40 = df2[df2["Age"] >= 40].count()[0]
ages = [age_10, age_14, age_19, age_24, age_29, age_34, age_39, age_40]

# Percents
percent_10 = round((age_10/player_count)*100,2)
percent_14 = round((age_14/player_count)*100,2)
percent_19 = round((age_19/player_count)*100,2)
percent_24 = round((age_24/player_count)*100,2)
percent_29 = round((age_29/player_count)*100,2)
percent_34 = round((age_34/player_count)*100,2)
percent_39 = round((age_39/player_count)*100,2)
percent_40 = round((age_40/player_count)*100,2)
percents_a = [percent_10, percent_14, percent_19, percent_24, percent_29, percent_34, percent_39, percent_40]

# Creating the dictionary
age_demo = {
        "Percent of Players": percents_a,
        "Total Count": ages
    }
    
# Creating DataFrame & setting index
age_demo = pd.DataFrame(age_demo)
age_demo.index = (["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"])
age_demo

Unnamed: 0,Percent of Players,Total Count
<10,2.95,17
10-14,3.82,22
15-19,18.58,107
20-24,44.79,258
25-29,13.37,77
30-34,9.03,52
35-39,5.38,31
40+,2.08,12


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [8]:
age_df = pd.DataFrame(df)
group_names = ['<10', '10-13', '14-17', '18-21', '22-25', '26-29', '30-33', '34-37', '38-40', '>40']
bins = [0, 9, 13, 17, 21, 25, 29, 33, 37, 40, 80]

age_demo_grp = age_df.groupby(pd.cut(age_df["Age"], bins, labels=group_names))

# Create the names for the four bins
age_demo_df = pd.DataFrame({"Purchase Count":age_demo_grp["Price"].count(), "Average Purchase Price":age_demo_grp["Price"].mean(),"Total Purchase Value":age_demo_grp["Price"].sum()})
age_demo_df["Average Purchase Price"] = age_demo_df["Average Purchase Price"].map("${:.2f}".format)
age_demo_df["Total Purchase Value"] = age_demo_df["Total Purchase Value"].map("${:,.2f}".format)
age_demo_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
<10,23,$3.35,$77.13
10-13,26,$2.92,$75.87
14-17,89,$3.01,$267.60
18-21,210,$3.08,$647.26
22-25,263,$3.05,$800.90
26-29,42,$2.65,$111.10
30-33,64,$3.00,$191.87
34-37,35,$3.21,$112.33
38-40,21,$3.53,$74.18
>40,7,$3.08,$21.53


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [22]:
spenders_df = pd.DataFrame(df)
spnd_grp = spenders_df.groupby(['SN'])
spnd_df = pd.DataFrame({"Purchase Count":top_spnd_grp["Price"].count(), "Average Purchase Price":top_spnd_grp["Price"].mean(),"Total Purchase Value":top_spnd_grp["Price"].sum()})
spnd_df = top_spnd_df.sort_values("Total Purchase Value", ascending=False)
spnd_df["Average Purchase Price"] = top_spnd_df["Average Purchase Price"]
spnd_df["Total Purchase Value"] = top_spnd_df["Total Purchase Value"]
spnd_df.head()

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,3.792,18.96
Idastidru52,4,3.8625,15.45
Chamjask73,3,4.61,13.83
Iral74,4,3.405,13.62
Iskadarya95,3,4.366667,13.1


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [23]:
popit_df = pd.DataFrame(df)
items_grp = popit_df.groupby(['Item ID','Item Name'])
items_df = pd.DataFrame({"Purchase Count":popitems_grp["Price"].count(), "Item Price":popitems_grp["Price"].mean(),"Total Purchase Value":popitems_grp["Price"].sum()})
items_df = popitems_df.sort_values("Purchase Count", ascending=False)
items_df["Item Price"] = popitems_df["Item Price"]
items_df["Total Purchase Value"] = popitems_df["Total Purchase Value"]
items_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77
82,Nirvana,9,$4.90,$44.10
145,Fiery Glass Crusader,9,$4.58,$41.22
60,Wolf,8,$3.54,$28.32
