In [1]:
import pandas as pd

# `df.pivot()`

- Allows you to **reshape** (pivot) a dataframe.

- `index` argument: column you want on rows.

- `columns` argument: the list of columns in the new dataframe.

- `values` argument: list of columns to be present in the resulting dataframe.

![image.png](attachment:image.png)

In [2]:
df = pd.read_csv('data/weather_pivot_1.csv')
df

Unnamed: 0,date,city,temperature,humidity
0,5/1/2017,new york,65,56
1,5/2/2017,new york,66,58
2,5/3/2017,new york,68,60
3,5/1/2017,mumbai,75,80
4,5/2/2017,mumbai,78,83
5,5/3/2017,mumbai,82,85
6,5/1/2017,beijing,80,26
7,5/2/2017,beijing,77,30
8,5/3/2017,beijing,79,35


In [4]:
df.pivot(index='city', columns='temperature')

Unnamed: 0_level_0,date,date,date,date,date,date,date,date,date,humidity,humidity,humidity,humidity,humidity,humidity,humidity,humidity,humidity
temperature,65,66,68,75,77,78,79,80,82,65,66,68,75,77,78,79,80,82
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2
beijing,,,,,5/2/2017,,5/3/2017,5/1/2017,,,,,,30.0,,35.0,26.0,
mumbai,,,,5/1/2017,,5/2/2017,,,5/3/2017,,,,80.0,,83.0,,,85.0
new york,5/1/2017,5/2/2017,5/3/2017,,,,,,,56.0,58.0,60.0,,,,,,


In [5]:
df.pivot(index='city', columns='humidity')

Unnamed: 0_level_0,date,date,date,date,date,date,date,date,date,temperature,temperature,temperature,temperature,temperature,temperature,temperature,temperature,temperature
humidity,26,30,35,56,58,60,80,83,85,26,30,35,56,58,60,80,83,85
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2
beijing,5/1/2017,5/2/2017,5/3/2017,,,,,,,80.0,77.0,79.0,,,,,,
mumbai,,,,,,,5/1/2017,5/2/2017,5/3/2017,,,,,,,75.0,78.0,82.0
new york,,,,5/1/2017,5/2/2017,5/3/2017,,,,,,,65.0,66.0,68.0,,,


## `df.pivot_table()`

- Allows you to summarize and aggregate tabular data.

![image.png](attachment:image.png)

- Consider the following dataset:

In [6]:
df = pd.read_csv('data/weather_pivot_2.csv')
df

Unnamed: 0,date,city,temperature,humidity
0,5/1/2017,new york,65,56
1,5/1/2017,new york,61,54
2,5/2/2017,new york,70,60
3,5/2/2017,new york,72,62
4,5/1/2017,mumbai,75,80
5,5/1/2017,mumbai,78,83
6,5/2/2017,mumbai,82,85
7,5/2/2017,mumbai,80,26


- Observe that for the same date, there are two records for the same city.

- We might want to aggregate that into a single record, having the average temperature, average humidity (for example).

- `index` argument: column(s) to be present as indices (in rows).

- `columns` argument: column(s) to be present in the new dataframe.

- `aggfunc` argument: aggregation function to use, default is `numpy.mean`.

In [10]:
df.pivot_table(index='city', columns='date')

Unnamed: 0_level_0,humidity,humidity,temperature,temperature
date,5/1/2017,5/2/2017,5/1/2017,5/2/2017
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
mumbai,81.5,55.5,76.5,81.0
new york,55.0,61.0,63.0,71.0


In [11]:
# using a custom aggregate function
df.pivot_table(index='city', columns='date', aggfunc='min')

Unnamed: 0_level_0,humidity,humidity,temperature,temperature
date,5/1/2017,5/2/2017,5/1/2017,5/2/2017
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
mumbai,80,26,75,80
new york,54,60,61,70


### Grouper in `pivot_table()`

- Consider the following dataset:

In [12]:
df = pd.read_csv('data/weather_pivot_3.csv')
df

Unnamed: 0,date,city,temperature,humidity
0,5/1/2017,new york,65,56
1,5/2/2017,new york,61,54
2,5/3/2017,new york,70,60
3,12/1/2017,new york,30,50
4,12/2/2017,new york,28,52
5,12/3/2017,new york,25,51


- Grouper can be used to aggregate based on data frequency.

    - For example, monthly average temperature.

In [13]:
df['date'] = pd.to_datetime(df['date'])

In [14]:
# get montly average temperature and humidity
df.pivot_table(index=pd.Grouper(freq='M', key='date'), columns='city')

Unnamed: 0_level_0,humidity,temperature
city,new york,new york
date,Unnamed: 1_level_2,Unnamed: 2_level_2
2017-05-31,56.666667,65.333333
2017-12-31,51.0,27.666667
