# Ex - GroupBy

### Introduction:

GroupBy can be summarized as Split-Apply-Combine.


####he framework is known as "split-apply-combine" because we... 
####Step 1: split the data into groups by creating a groupby object from the original DataFrame; 
####Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step

####Step 3: combine the results into a new DataFrame.

####Pandas groupby-apply is an invaluable tool in a Python data scientist’s toolkit.
####you can apply groupby method to a flat table with a simple 1D index column. That doesn’t perform any operations on the table yet, but only returns a DataFrameGroupBy instance and so it needs to be chained to some kind of an aggregation function (for example, sum, mean, min, max, etc.
#The apply() method


####Instead of using one of the stock functions provided by Pandas to operate on the groups we can define our own custom function and run it on the table via the apply() method.

To write a custom function well, you need to understand how the two methods work with each other in the so-called Groupby-Split-Apply-Combine chain mechanism **bold text**

As I already mentioned, the first stage is creating a Pandas groupby object (DataFrameGroupBy) which provides an interface for the apply method to group rows together according to specified column(s) values.

We split the groups transiently and loop them over via an optimized Pandas inner code. We then pass each group to a specified function as either a Series or a DataFrame object.

The output of a function is stored temporarily until all groups have been processed. In the last stage all the results (from each function invocation) are finally combined into a single output.

### Step 2. Import the drinks dataset

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

pd.read_table('drinks.csv', sep=',')   # read_table is more general
pd.read_csv('drinks.csv')  # read_csv is specific to CSV and implies sep=","



Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa
...,...,...,...,...,...,...
188,Venezuela,333,100,3,7.7,South America
189,Vietnam,111,2,1,2.0,Asia
190,Yemen,6,0,0,0.1,Asia
191,Zambia,32,19,4,2.5,Africa


### Step 3. Assign it to a variable called drinks.

In [2]:
drinks=pd.read_table('drinks.csv', sep=',')   # read_table is more general
drinks=pd.read_csv('drinks.csv')  # read_csv is specific to CSV and implies sep=","


### Step 4. Which continent drinks more beer on average?

In [3]:
drinks.groupby('continent').beer_servings.mean().sort_values(ascending = False)

continent
Europe           193.777778
South America    175.083333
North America    145.434783
Oceania           89.687500
Africa            61.471698
Asia              37.045455
Name: beer_servings, dtype: float64

### Step 5. For each continent print the statistics for wine consumption.

In [4]:
drinks.groupby('continent').wine_servings.describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Africa,53.0,16.264151,38.846419,0.0,1.0,2.0,13.0,233.0
Asia,44.0,9.068182,21.667034,0.0,0.0,1.0,8.0,123.0
Europe,45.0,142.222222,97.421738,0.0,59.0,128.0,195.0,370.0
North America,23.0,24.521739,28.266378,1.0,5.0,11.0,34.0,100.0
Oceania,16.0,35.625,64.55579,0.0,1.0,8.5,23.25,212.0
South America,12.0,62.416667,88.620189,1.0,3.0,12.0,98.5,221.0


### Step 6. Print the mean alcohol consumption per continent for every column

In [5]:
drinks.groupby('continent').mean()

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Africa,61.471698,16.339623,16.264151,3.007547
Asia,37.045455,60.840909,9.068182,2.170455
Europe,193.777778,132.555556,142.222222,8.617778
North America,145.434783,165.73913,24.521739,5.995652
Oceania,89.6875,58.4375,35.625,3.38125
South America,175.083333,114.75,62.416667,6.308333


### Step 7. Print the median alcohol consumption per continent for every column

In [6]:
drinks.groupby('continent').median()

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Africa,32.0,3.0,2.0,2.3
Asia,17.5,16.0,1.0,1.2
Europe,219.0,122.0,128.0,10.0
North America,143.0,137.0,11.0,6.3
Oceania,52.5,37.0,8.5,1.75
South America,162.5,108.5,12.0,6.85


### Step 8. Print the mean, min and max values for spirit consumption.
#### This time output a DataFrame

In [7]:
drinks.groupby('continent').spirit_servings.agg(['mean', 'min', 'max'])

Unnamed: 0_level_0,mean,min,max
continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Africa,16.339623,0,152
Asia,60.840909,0,326
Europe,132.555556,0,373
North America,165.73913,68,438
Oceania,58.4375,0,254
South America,114.75,25,302
