### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [2]:
import warnings
warnings.filterwarnings("ignore")

## Player Count

* Display the total number of players


In [3]:
players = purchase_data["SN"].unique()
numplayers = len(purchase_data["SN"].unique())
p = {"Total Players": [numplayers]}
total_players = pd.DataFrame(data=p)
total_players

Unnamed: 0,Total Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [4]:
unique = len(purchase_data["Item ID"].unique()) #number of unique items
avgp = purchase_data["Price"].mean() #average purchase price
nump = purchase_data["Item Name"].count() #number of purchases
totalrev = purchase_data["Price"].sum() #total revenue
analysis = {"Number of Unique Items": [unique],
            "Average Price": [avgp],
            "Number of Purchases":[nump],
            "Total Revenue": [totalrev]}
purchase_analysis = pd.DataFrame(data = analysis)
#format average price and total revenue
purchase_analysis["Average Price"] = purchase_analysis["Average Price"].map("${:.2f}".format)
purchase_analysis["Total Revenue"] = purchase_analysis["Total Revenue"].map("${:,.2f}".format)
purchase_analysis

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,183,$3.05,780,"$2,379.77"


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [5]:
#set gender demographic conditions
male = purchase_data["Gender"] == "Male"
female = purchase_data["Gender"] == "Female"
other = purchase_data["Gender"] == "Other / Non-Disclosed"

#apply conditions
#drop duplicate SNs to get accurate count of players
df = purchase_data.drop_duplicates(subset = ["SN"])
males = df["Gender"][male]
females = df["Gender"][female]
others = df["Gender"][other]

#get count/percentages
malecount = males.count()
malepercent = (malecount/numplayers)*100
femalecount = females.count()
femalepercent = (femalecount/numplayers)*100
othercount = others.count()
otherpercent = (othercount/numplayers)*100

#display data frame
genders = { "Total Count": [malecount,femalecount,othercount],
          "Percentage of Players":[malepercent,femalepercent,otherpercent]
          }
Index = ["Male","Female","Other / Non-Disclosed"]
gender_data = pd.DataFrame(data=genders,index=Index)
gender_data["Percentage of Players"] = gender_data["Percentage of Players"].map("{:.2f} %".format)
gender_data

Unnamed: 0,Total Count,Percentage of Players
Male,484,84.03 %
Female,81,14.06 %
Other / Non-Disclosed,11,1.91 %


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [6]:
#sort dataframe info by gender
male_df = purchase_data[male]
female_df = purchase_data[female]
other_df = purchase_data[other]

#create calculation values
#males
m_count = male_df["Gender"].count()
m_avgp = male_df["Price"].mean()
m_totalp = male_df["Price"].sum()
m_ppp = m_totalp/malecount
#female
f_count = female_df["Gender"].count()
f_avgp = female_df["Price"].mean()
f_totalp = female_df["Price"].sum()
f_ppp = f_totalp/femalecount
#other
o_count = other_df["Gender"].count()
o_avgp = other_df["Price"].mean()
o_totalp = other_df["Price"].sum()
o_ppp = o_totalp/othercount

pganalysis ={
    "Gender":Index,
    "Purchase Count":[m_count,f_count,o_count],
    "Average Purchase Price":[m_avgp,f_avgp,o_avgp],
    "Total Purchase Value":[m_totalp,f_totalp,o_totalp],
    "Avg Total Purchase per Person": [m_ppp,f_ppp,o_ppp]
}
pg_analysis = pd.DataFrame(data = pganalysis)
#format
pg_analysis["Average Purchase Price"] = pg_analysis["Average Purchase Price"].map("${:.2f}".format)
pg_analysis["Total Purchase Value"] = pg_analysis["Total Purchase Value"].map("${:,.2f}".format)
pg_analysis["Avg Total Purchase per Person"] = pg_analysis["Avg Total Purchase per Person"].map("${:.2f}".format)
pg_analysis.set_index("Gender")

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Male,652,$3.02,"$1,967.64",$4.07
Female,113,$3.20,$361.94,$4.47
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [7]:
#create bins
bins = [0,12,17,50]
#labels
labels = ["Kids","Teenagers","Adults"]
#cut and separate using dataframe with unique SNs
df["Age Group"] = pd.cut(df["Age"],bins,labels=labels)
kidcond = df["Age Group"] == "Kids"
kiddf = df[kidcond]
teencond = df["Age Group"] == "Teenagers"
teendf = df[teencond]
adultcond = df["Age Group"] == "Adults"
adultdf = df[adultcond]

#get numbers and percent
kidcount = kiddf["Age Group"].count()
kidpercent = (kidcount/numplayers)*100
teencount = teendf["Age Group"].count()
teenpercent = (teencount/numplayers)*100
adultcount = adultdf["Age Group"].count()
adultpercent = (adultcount/numplayers)*100

#create table
ages = {
    "Total Count": [kidcount,teencount,adultcount],
    "Percentage of Players": [kidpercent,teenpercent,adultpercent]
}
ages_df = pd.DataFrame(data=ages,index = labels)
ages_df["Percentage of Players"] = ages_df["Percentage of Players"].map("{:.2f} %".format)
ages_df

Unnamed: 0,Total Count,Percentage of Players
Kids,34,5.90 %
Teenagers,74,12.85 %
Adults,468,81.25 %


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [8]:
#bin and group data
purchase_data["Age Group"] = pd.cut(purchase_data["Age"],bins,labels=labels)
kidcond = purchase_data["Age Group"] == "Kids"
kiddf = purchase_data[kidcond]
teencond = purchase_data["Age Group"] == "Teenagers"
teendf = purchase_data[teencond]
adultcond = purchase_data["Age Group"] == "Adults"
adultdf = purchase_data[adultcond]

#basic calculations
#kids
kcount = kiddf["Price"].count()
kavgp = kiddf["Price"].mean()
ktotalp = kiddf["Price"].sum()
kppp = ktotalp/kidcount
#teen
tcount = teendf["Price"].count()
tavgp = teendf["Price"].mean()
ttotalp = teendf["Price"].sum()
tppp = ttotalp/teencount
#adult
acount = adultdf["Price"].count()
aavgp = adultdf["Price"].mean()
atotalp = adultdf["Price"].sum()
appp = atotalp/adultcount

#display table
moneyage = {
     "Purchase Count":[kcount,tcount,acount],
    "Average Purchase Price":[kavgp,tavgp,aavgp],
    "Total Purchase Value":[ktotalp,ttotalp,atotalp],
    "Avg Total Purchase per Person": [kppp,tppp,appp]    
}
moneyage_df = pd.DataFrame(data = moneyage,index=labels)
moneyage_df["Average Purchase Price"] = moneyage_df["Average Purchase Price"].map("${:.2f}".format)
moneyage_df["Total Purchase Value"] = moneyage_df["Total Purchase Value"].map("${:,.2f}".format)
moneyage_df["Avg Total Purchase per Person"] = moneyage_df["Avg Total Purchase per Person"].map("${:.2f}".format)
moneyage_df

Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Kids,45,$3.19,$143.55,$4.22
Teenagers,93,$2.98,$277.05,$3.74
Adults,642,$3.05,"$1,959.17",$4.19


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [17]:
purchase_data.head()
purchase_data["SN"].value_counts()[:5]
#snavgp = sndf["Price"].mean()
#sntotalp = sndf["Price"].sum()
#sndf.head()

Lisosia93      5
Iral74         4
Idastidru52    4
Tyisur83       3
Aina42         3
Name: SN, dtype: int64

## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [None]:
#popular_df = purchase_data[["Item ID","Item Name","Price"]]
#popular_df.groupby(["Item ID","Item Name"]).head()

## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

