# Pivot table

- Similar to GroupBy but operates across a two-dimensional grid, instead of one column or row.

**Comparison with GroupBy:**

- GroupBy splits and combines across a one-dimensional index.
- Pivot tables split and combine across a two-dimensional grid, providing a more complex summary.

In [1]:
import numpy as np 
import pandas as pd 
import seaborn as sns 

titanic = sns.load_dataset('titanic')

In [2]:
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


# Pivot table by hand

- Titanic passengers data available through the Seaborn library.
- Use simple groupby operation to get survival mean - genderwise and gender+class wise.


In [7]:
titanic.groupby('sex')[['survived']].mean()

Unnamed: 0_level_0,survived
sex,Unnamed: 1_level_1
female,0.742038
male,0.188908


In [10]:
titanic.groupby(['sex', 'class'])['survived'].aggregate('mean').unstack()

  titanic.groupby(['sex', 'class'])['survived'].aggregate('mean').unstack()


class,First,Second,Third
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


**Drawbacks of GroupBy for Multidimensional Data**

1. Complexity: The code becomes complex and harder to read with multi-level GroupBy operations.
2. Solution: Use pivot_table for a more readable and convenient way to perform similar multidimensional analysis.

- Use pivot_table for a cleaner, more readable way to achieve the same results as GroupBy when dealing with multidimensional data.

# Pivot table syntax

``` python

pivot_table(data, values=None, index=None, columns=None)

```

In [13]:
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [14]:
titanic.pivot_table('survived', index= 'sex', columns='class')

  titanic.pivot_table('survived', index= 'sex', columns='class')


class,First,Second,Third
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


In [15]:
titanic.pivot_table('survived', index=['sex', 'age'], columns='class')

  titanic.pivot_table('survived', index=['sex', 'age'], columns='class')


Unnamed: 0_level_0,class,First,Second,Third
sex,age,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,0.75,,,1.00
female,1.00,,,1.00
female,2.00,0.0,1.0,0.25
female,3.00,,1.0,0.00
female,4.00,,1.0,1.00
...,...,...,...,...
male,70.00,0.0,0.0,
male,70.50,,,0.00
male,71.00,0.0,,
male,74.00,,,0.00


## Multilevel Pivot Tables

-  Pivot tables can use multiple levels of grouping, similar to GroupBy.
- Example: Adding age as a third dimension using pd.cut()

In [17]:
age = pd.cut(titanic['age'], [0,18,80])
age

0      (18.0, 80.0]
1      (18.0, 80.0]
2      (18.0, 80.0]
3      (18.0, 80.0]
4      (18.0, 80.0]
           ...     
886    (18.0, 80.0]
887    (18.0, 80.0]
888             NaN
889    (18.0, 80.0]
890    (18.0, 80.0]
Name: age, Length: 891, dtype: category
Categories (2, interval[int64, right]): [(0, 18] < (18, 80]]

In [18]:
titanic.pivot_table('survived', ['sex', age], columns='class')

  titanic.pivot_table('survived', ['sex', age], columns='class')


Unnamed: 0_level_0,class,First,Second,Third
sex,age,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,"(0, 18]",0.909091,1.0,0.511628
female,"(18, 80]",0.972973,0.9,0.423729
male,"(0, 18]",0.8,0.6,0.215686
male,"(18, 80]",0.375,0.071429,0.133663


## Adding More Dimensions
- You can add more information in columns using additional variables like fare:
- Example: Using pd.qcut() to create quantiles of fare:

In [19]:
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [21]:
fare = pd.qcut(titanic['fare'], 2)
fare

0       (-0.001, 14.454]
1      (14.454, 512.329]
2       (-0.001, 14.454]
3      (14.454, 512.329]
4       (-0.001, 14.454]
             ...        
886     (-0.001, 14.454]
887    (14.454, 512.329]
888    (14.454, 512.329]
889    (14.454, 512.329]
890     (-0.001, 14.454]
Name: fare, Length: 891, dtype: category
Categories (2, interval[float64, right]): [(-0.001, 14.454] < (14.454, 512.329]]

In [23]:
titanic.pivot_table('survived', index=['sex', age], columns=[fare, 'class'])

  titanic.pivot_table('survived', index=['sex', age], columns=[fare, 'class'])


Unnamed: 0_level_0,fare,"(-0.001, 14.454]","(-0.001, 14.454]","(-0.001, 14.454]","(14.454, 512.329]","(14.454, 512.329]","(14.454, 512.329]"
Unnamed: 0_level_1,class,First,Second,Third,First,Second,Third
sex,age,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
female,"(0, 18]",,1.0,0.714286,0.909091,1.0,0.318182
female,"(18, 80]",,0.88,0.444444,0.972973,0.914286,0.391304
male,"(0, 18]",,0.0,0.26087,0.8,0.818182,0.178571
male,"(18, 80]",0.0,0.098039,0.125,0.391304,0.030303,0.192308


- Result: A four-dimensional table showing survival rates based on gender, age, fare, and class.

## Additional Pivot Table Options

``` python

DataFrame.pivot_table(data, values=None, index=None, columns=None,
aggfunc='mean', fill_value=None, margins=False,
dropna=True, margins_name='All')
```

**Common Options:**

1. aggfunc:

- Specifies the aggregation function, default is mean.
- Can use standard functions like 'sum', 'count', 'min', 'max' or a custom function like np.sum.


In [25]:
titanic.pivot_table(index='sex', columns='class',
                    aggfunc={'survived': sum, 'fare':'mean'})

  titanic.pivot_table(index='sex', columns='class',
  titanic.pivot_table(index='sex', columns='class',


Unnamed: 0_level_0,fare,fare,fare,survived,survived,survived
class,First,Second,Third,First,Second,Third
sex,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
female,106.125798,21.970121,16.11881,91,70,72
male,67.226127,19.741782,12.661633,45,17,47


2. margins:

- Adds total values for each group.

In [26]:
titanic.pivot_table('survived', index='sex', columns='class', margins=True)


  titanic.pivot_table('survived', index='sex', columns='class', margins=True)


class,First,Second,Third,All
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,0.968085,0.921053,0.5,0.742038
male,0.368852,0.157407,0.135447,0.188908
All,0.62963,0.472826,0.242363,0.383838


3. margins_name:

- Changes the label for the margins (default is "All").
- Can customize the name for the aggregated total row/column.

**Advantages of Pivot Tables**
- More readable and intuitive compared to multi-step GroupBy operations.
- Useful for multidimensional data analysis and summarization.
- Easily handles complex aggregations and visualizes hierarchical relationships in data.