# Summation in Pandas and Numpy
Summation is an important measurement for exploring and finding answers where you may need to aggregate number totals. Explore this notebook using the `census_income_data.csv` dataset to answer a few interesting questions using summation. We'll utilize the groupby method to facilitate our methodology.

In order to sum values up in pandas, you can easily use the `.sum()` method on a DataFrame, Series, or even Group, to aggregate numerical values. And just a reminder, pandas uses Numpy under the hood, so calculating these metrics are extremely fast.

In [None]:
import pandas as pd

In [None]:
# Load the dataset
df_census = pd.read_csv('census_income_data.csv')

In [None]:
df_census.head()

## Using Sum on a DataFrame

#### How much capital was gained and lost in our dataset?
Let's use the `.sum()` method on our DataFrame to aggregate these totals at a high level.
Overall people made more money than they lost, that is a good thing!

In [None]:
df_census[["capital-gain", "capital-loss"]].sum()

## Using Sum on a Group

#### What are the different workclass types?

In [None]:
df_census["workclass"].value_counts()

#### If we group by 'workclass', what are some interesting questions and answers?  
'Self-emp-inc' and 'State-gov' have similar hours-per-week, so let's use it as a way to compare similar groups.  
Notice their 'capital-gain' is completely different! It seems the Self-emp-inc group buys and sells assets more than the State-gov group.

In [None]:
df_census.groupby(by="workclass").sum(numeric_only=True)

#### Let's do a similar comparison for occupation

In [None]:
df_census["occupation"].value_counts()

#### Group by and take the sum for occupation
We will sort by 'hours-per-week' so it can make some comparisons easier.  
Top three have similar total hours worked, but vastly different 'capital-gain'!

In [None]:
df_census.groupby(by="occupation").sum(numeric_only=True).sort_values(by="hours-per-week", ascending=False)

#### Let's do a similar comparison for marital status

In [None]:
df_census["marital-status"].value_counts()

#### Group by and take the sum for marital-status
Marital status is a little harder to compare with 'hours-per-week'.
It is not easily comparable, but still interesting to see the differences of each.

In [None]:
df_census.groupby(by="marital-status").sum(numeric_only=True).sort_values(by="hours-per-week", ascending=False)