## Pandas Tutorial 10: Pivot Basics

In this tutorial, we explore how to use the `pivot()` and `pivot_table()` functions in Pandas to reshape and summarize DataFrames. These tools allow you to transform data into different formats and perform aggregations.

#### Topics covered:
* **What is a pivot?**
* **Using the `pivot()` function**
* **Understanding pivot tables**
* **Using `pivot_table()`**
* **Applying the `aggfunc` argument in `pivot_table()`**
* **Using `Grouper()` for aggregation**

This tutorial will help you effectively reshape and aggregate your data for analysis.

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv("weather.csv")
df

Unnamed: 0,date,city,temperature,humidity
0,5/1/2017,new york,65,56
1,5/2/2017,new york,66,58
2,5/3/2017,new york,68,60
3,5/1/2017,mumbai,75,80
4,5/2/2017,mumbai,78,83
5,5/3/2017,mumbai,82,85
6,5/1/2017,beijing,80,26
7,5/2/2017,beijing,77,30
8,5/3/2017,beijing,79,35


## Using `pivot()` to Reshape Data

The `pivot()` function reshapes the DataFrame by setting `city` as the index and spreading `date` across columns.

**Key Features:**
* `index`: Sets `city` as the index
* `columns`: Spreads `date` across columns

This efficiently reorganizes data for analysis.

In [16]:
# Reshapes DataFrame with 'city' as index and 'date' as columns
df.pivot(index='city',columns='date')

Unnamed: 0_level_0,temperature,temperature,temperature,temperature,temperature,temperature,humidity,humidity,humidity,humidity,humidity,humidity
date,2017-05-01,2017-05-02,2017-05-03,2017-12-01,2017-12-02,2017-12-03,2017-05-01,2017-05-02,2017-05-03,2017-12-01,2017-12-02,2017-12-03
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
new york,65,61,70,30,28,25,56,54,60,50,52,51


## Using `pivot()` with `values` arg

The `pivot()` function reshapes the DataFrame by setting `city` as the index, `date` as the columns, and populating the table with `humidity` values.

**Key Features:**
* `index`: Uses `city` as the index.
* `columns`: Spreads `date` across columns.
* `values`: Fills the table with `humidity` values.

This approach provides a more focused table based on the selected data column (`humidity`).

In [4]:
# Reshapes the DataFrame with 'city' as index, 'date' as columns, and 'humidity' as the values
df.pivot(index='city',columns='date',values="humidity")

date,5/1/2017,5/2/2017,5/3/2017
city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
beijing,26,30,35
mumbai,80,83,85
new york,56,58,60


## Using `pivot()` to Reshape Data

The `pivot()` function reshapes the DataFrame by setting `date` as the index and `city` as the columns, with the remaining values filled automatically.
**Key Features:**
* `index`: Uses `date` as the index.
* `columns`: Spreads `city` across columns.

This efficiently reorganizes data for time-based analysis across different cities.

In [5]:
# Reshapes DataFrame with 'date' as index and 'city' as columns
df.pivot(index='date',columns='city')

Unnamed: 0_level_0,temperature,temperature,temperature,humidity,humidity,humidity
city,beijing,mumbai,new york,beijing,mumbai,new york
date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
5/1/2017,80,75,65,26,80,56
5/2/2017,77,78,66,30,83,58
5/3/2017,79,82,68,35,85,60


## Using `pivot()` with `humidity` as Index

The `pivot()` function reshapes the DataFrame by setting `humidity` as the index and `city` as the columns.


**Key Features:**
* `index`: Uses `humidity` as the index.
* `columns`: Spreads `city` across columns.

This transformation helps analyze the data by humidity levels across different cities.

In [6]:
# Reshapes DataFrame with 'humidity' as index and 'city' as columns
df.pivot(index='humidity',columns='city')

Unnamed: 0_level_0,date,date,date,temperature,temperature,temperature
city,beijing,mumbai,new york,beijing,mumbai,new york
humidity,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
26,5/1/2017,,,80.0,,
30,5/2/2017,,,77.0,,
35,5/3/2017,,,79.0,,
56,,,5/1/2017,,,65.0
58,,,5/2/2017,,,66.0
60,,,5/3/2017,,,68.0
80,,5/1/2017,,,75.0,
83,,5/2/2017,,,78.0,
85,,5/3/2017,,,82.0,


In [8]:
df = pd.read_csv("weather2.csv")
df

Unnamed: 0,date,city,temperature,humidity
0,5/1/2017,new york,65,56
1,5/1/2017,new york,61,54
2,5/2/2017,new york,70,60
3,5/2/2017,new york,72,62
4,5/1/2017,mumbai,75,80
5,5/1/2017,mumbai,78,83
6,5/2/2017,mumbai,82,85
7,5/2/2017,mumbai,80,26


In [9]:
# Reshapes DataFrame with 'city' as index and 'date' as columns
df.pivot_table(index="city",columns="date")

Unnamed: 0_level_0,humidity,humidity,temperature,temperature
date,5/1/2017,5/2/2017,5/1/2017,5/2/2017
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
mumbai,81.5,55.5,76.5,81.0
new york,55.0,61.0,63.0,71.0


## Using `pivot_table()` with Summation and Margins

The `pivot_table()` function creates a summarized table by calculating the sum of values, with `city` as the index and `date` as the columns. The `margins=True` argument adds row and column totals.

**Key Features:**
* `index`: Uses `city` as the index.
* `columns`: Spreads `date` across columns.
* `aggfunc="sum"`: Aggregates data using summation.
* `margins=True`: Adds totals for rows and columns.

This is useful for summarizing data with totals.

In [18]:
# Creates a pivot table summing values, adding margins for totals
df.pivot_table(index="city",columns="date", margins=True, aggfunc="sum")

Unnamed: 0_level_0,humidity,humidity,humidity,humidity,humidity,humidity,humidity,temperature,temperature,temperature,temperature,temperature,temperature,temperature
date,2017-05-01 00:00:00,2017-05-02 00:00:00,2017-05-03 00:00:00,2017-12-01 00:00:00,2017-12-02 00:00:00,2017-12-03 00:00:00,All,2017-05-01 00:00:00,2017-05-02 00:00:00,2017-05-03 00:00:00,2017-12-01 00:00:00,2017-12-02 00:00:00,2017-12-03 00:00:00,All
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2
new york,56,54,60,50,52,51,323,65,61,70,30,28,25,279
All,56,54,60,50,52,51,323,65,61,70,30,28,25,279


In [12]:
df = pd.read_csv("weather3.csv")
df

Unnamed: 0,date,city,temperature,humidity
0,5/1/2017,new york,65,56
1,5/2/2017,new york,61,54
2,5/3/2017,new york,70,60
3,12/1/2017,new york,30,50
4,12/2/2017,new york,28,52
5,12/3/2017,new york,25,51


In [14]:
df['date'] = pd.to_datetime(df['date'])

## Using `pivot_table()` with `Grouper()` for Monthly Grouping

The `Grouper()` function groups data by a specified frequency. Here, it groups the `date` column by month (`'M'`), with `city` spread across columns.

**Key Features:**
* `Grouper(freq='M', key='date')`:Groups the `date` column by month.
* `columns='city'`: Pivots data based on cities.

This is ideal for summarizing data by month across different cities.

In [15]:
# Groups data by month (`M`) and pivots on 'city'
df.pivot_table(index=pd.Grouper(freq='M',key='date'),columns='city')

Unnamed: 0_level_0,humidity,temperature
city,new york,new york
date,Unnamed: 1_level_2,Unnamed: 2_level_2
2017-05-31,56.666667,65.333333
2017-12-31,51.0,27.666667
