# Measures of Center in Pandas and Numpy
Measures of center are typically defined as mean (average), median, and mode. The goal is to find what the center of your data is. We can use `.mean()` and `.median()` on our DataFrame, Series, or Group to calculate these measurements.

Explore this notebook using the `census_income_data.csv` dataset to answer questions from these methods. We'll utilize the groupby method again to facilitate our methodology.

In [1]:
import pandas as pd

In [2]:
# Load the dataset
df_census = pd.read_csv('census_income_data.csv')

## Using mean on a DataFrame

#### What was the average capital gained and lost in our dataset?
Let's use the `.mean()` method on our DataFrame to aggregate these totals at a high level.
On average, not many people buy or sell assets.

In [3]:
df_census[["capital-gain", "capital-loss"]].mean()

capital-gain    1077.648844
capital-loss      87.303830
dtype: float64

## Using Mean and Median on a Group

#### What are the different workclass types

In [6]:
df_census["workclass"].value_counts()

 Private             22696
 Self-emp-not-inc     2541
 Local-gov            2093
 State-gov            1298
 Self-emp-inc         1116
 Federal-gov           960
 Without-pay            14
 Never-worked            7
Name: workclass, dtype: int64

#### If we group by 'workclass', what are some interesting questions and answers?
We'll use `.mean()` and `.median()` to see how the each metric tells a different story for each 'workclass'.

Self-emp-inc on average make more money through selling assets (capital-gain), but what is going when we use `.median()`?

`capital-gain` and `capital-loss` are both zero for each group. What does this tell us about our data?

In [13]:
df_census.groupby(by="workclass").mean(numeric_only=True)

Unnamed: 0_level_0,age,fnlwgt,education-num,capital-gain,capital-loss,hours-per-week
workclass,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Federal-gov,42.590625,185221.24375,10.973958,833.232292,112.26875,41.379167
Local-gov,41.751075,188639.712852,11.042045,880.20258,109.854276,40.9828
Never-worked,20.571429,225989.571429,7.428571,0.0,0.0,28.428571
Private,36.797585,192764.114734,9.879714,889.217792,80.008724,40.267096
Self-emp-inc,46.017025,175981.344086,11.137097,4875.693548,155.138889,48.8181
Self-emp-not-inc,44.969697,175608.64148,10.226289,1886.061787,116.631641,44.421881
State-gov,39.436055,184136.613251,11.375963,701.699538,83.256549,39.031587
Without-pay,47.785714,174267.5,9.071429,487.857143,0.0,32.714286


In [14]:
df_census.groupby(by="workclass").median(numeric_only=True)

Unnamed: 0_level_0,age,fnlwgt,education-num,capital-gain,capital-loss,hours-per-week
workclass,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Federal-gov,43.0,175771.0,10.0,0.0,0.0,40.0
Local-gov,41.0,179580.0,11.0,0.0,0.0,40.0
Never-worked,18.0,188535.0,7.0,0.0,0.0,35.0
Private,35.0,181091.0,10.0,0.0,0.0,40.0
Self-emp-inc,45.0,165667.0,10.0,0.0,0.0,50.0
Self-emp-not-inc,44.0,168109.0,10.0,0.0,0.0,40.0
State-gov,39.0,169402.5,10.0,0.0,0.0,40.0
Without-pay,57.0,171531.5,9.0,0.0,0.0,27.5


#### How about for occupation?

In [15]:
df_census["occupation"].value_counts()

 Prof-specialty       4140
 Craft-repair         4099
 Exec-managerial      4066
 Adm-clerical         3770
 Sales                3650
 Other-service        3295
 Machine-op-inspct    2002
 Transport-moving     1597
 Handlers-cleaners    1370
 Farming-fishing       994
 Tech-support          928
 Protective-serv       649
 Priv-house-serv       149
 Armed-Forces            9
Name: occupation, dtype: int64

Using mean and median by each occupation group, `Exec-managerial` and `Prof-specialty` both made more money through `capital-gain` than others.

Median still does not provide much in terms of information in our analysis. This group suffers the same problem as `workclass`. Most of the data in each group actually have 0 in `capital-gain`, which is visualized when we try and calcuate it!

In [None]:
df_census.groupby(by="occupation").mean(numeric_only=True)

In [None]:
df_census.groupby(by="occupation").median(numeric_only=True)