In [1]:
import pandas as pd

In [2]:
# looks so nice and clean!
df = pd.DataFrame(data=[[12, 10, 40], [9, 7, 12], [0, 14, 190]], 
                  columns=['Apple', 'Orange', 'Banana'],
                  index=['Texas', 'Arizona', 'Florida'])
df

Unnamed: 0,Apple,Orange,Banana
Texas,12,10,40
Arizona,9,7,12
Florida,0,14,190


In [5]:
# stacking the data into a Series
df.stack()

Texas    Apple      12
         Orange     10
         Banana     40
Arizona  Apple       9
         Orange      7
         Banana     12
Florida  Apple       0
         Orange     14
         Banana    190
dtype: int64

### Much Tidier
With one command, the above data is much closer to being tidy.

### A Multi-Level Index
The returned Series index is now comprised of two levels (a MultiIndex). The **`reset_index`** will push all these values back out as normal DataFrame columns.

In [7]:
df_tidy = df.stack().reset_index()
df_tidy

Unnamed: 0,level_0,level_1,0
0,Texas,Apple,12
1,Texas,Orange,10
2,Texas,Banana,40
3,Arizona,Apple,9
4,Arizona,Orange,7
5,Arizona,Banana,12
6,Florida,Apple,0
7,Florida,Orange,14
8,Florida,Banana,190


### Rename the columns
The default column names after calling **`reset_index`** are not useful. Let's rename the columns directly with a list.

In [8]:
df_tidy.columns = ['State', 'Fruit', 'Weight']
df_tidy

Unnamed: 0,State,Fruit,Weight
0,Texas,Apple,12
1,Texas,Orange,10
2,Texas,Banana,40
3,Arizona,Apple,9
4,Arizona,Orange,7
5,Arizona,Banana,12
6,Florida,Apple,0
7,Florida,Orange,14
8,Florida,Banana,190


In [9]:
# All steps together
df_tidy = df.stack().reset_index()
df_tidy.columns = ['State', 'Fruit', 'Weight']
df_tidy

Unnamed: 0,State,Fruit,Weight
0,Texas,Apple,12
1,Texas,Orange,10
2,Texas,Banana,40
3,Arizona,Apple,9
4,Arizona,Orange,7
5,Arizona,Banana,12
6,Florida,Apple,0
7,Florida,Orange,14
8,Florida,Banana,190


### Alternate way of renaming the levels and the Series before `reset_index`
It's possible to do the tidying and column renaming in a single line of code. When the **`rename_axis`** method is passed a list (or a scalar) it renames the levels. Let's see the result of this step.

In [3]:
df.stack().rename_axis(['Texas', 'Fruit'])

Texas    Fruit 
Texas    Apple      12
         Orange     10
         Banana     40
Arizona  Apple       9
         Orange      7
         Banana     12
Florida  Apple       0
         Orange     14
         Banana    190
dtype: int64

Notice the level names directly above each index level. We can give the Series itself a name by passing a string to the **`rename`** method.

In [4]:
df.stack().rename_axis(['Texas', 'Fruit']).rename('Weight')

Texas    Fruit 
Texas    Apple      12
         Orange     10
         Banana     40
Arizona  Apple       9
         Orange      7
         Banana     12
Florida  Apple       0
         Orange     14
         Banana    190
Name: Weight, dtype: int64

Now, the levels have names and the Series itself has a name. When we use the **`reset_index`** then the old level names become column names and the Series name becomes the column name for the Series values.

In [5]:
df.stack()\
  .rename_axis(['State', 'Fruit'])\
  .rename('Weight')\
  .reset_index()

Unnamed: 0,State,Fruit,Weight
0,Texas,Apple,12
1,Texas,Orange,10
2,Texas,Banana,40
3,Arizona,Apple,9
4,Arizona,Orange,7
5,Arizona,Banana,12
6,Florida,Apple,0
7,Florida,Orange,14
8,Florida,Banana,190


### `stack` vs `melt`
The primary purpose of both **`stack`** and **`melt`** is to take multiple columns and put them in a single column. Think of columns being stacked one on top of one another or columns literally melting their data down into one common place. Each value in this long column will be labeled by it's original column name.

The **`stack`** method takes every column of the DataFrame and stacks all the values into a single column. You do not get to choose a subset of columns. The column names also get put into the **`index`** and create a MultiIndex.

The **`melt`** method gives you more control and allows you to choose which columns will be stacked and which ones will remain as labels. Any values in the index must be first reset if they are going to be used with **`melt`**.

**Terminology**: For the sake of brevity 'stacked' and 'melted' will refer to the same exact data operation. You will also will hear this called **unpivoting**.

### Set the index before using `stack`
When using the **`stack`** method, all the column names get put into the index. The previous index gets 'pushed' one level out. Therefore the current index does not get stacked and it remains as a row identifier.

In order to tidy data without overly stacking your data, you need to put the identifying column(s) into the index. For instance, see the example below. If you have a column like **`State`** that you don't want to stack, put it in the index first.

In [17]:
df3 = pd.DataFrame(data=[['Texas', 12, 10, 40], ['Arizona', 9, 7, 12], ['Florida', 0, 14, 190]], 
                   columns=['State', 'Apple', 'Orange', 'Banana'])

In [18]:
df3

Unnamed: 0,State,Apple,Orange,Banana
0,Texas,12,10,40
1,Arizona,9,7,12
2,Florida,0,14,190


If you don't put **State** in the index then the data becomes 'overly-stacked'

In [19]:
df3.stack()

0  State       Texas
   Apple          12
   Orange         10
   Banana         40
1  State     Arizona
   Apple           9
   Orange          7
   Banana         12
2  State     Florida
   Apple           0
   Orange         14
   Banana        190
dtype: object

Put **State** in the index first and then stack.

In [20]:
df3.set_index('State').stack()

State          
Texas    Apple      12
         Orange     10
         Banana     40
Arizona  Apple       9
         Orange      7
         Banana     12
Florida  Apple       0
         Orange     14
         Banana    190
dtype: int64

### `unstack` method
The `unstack` DataFrame method inverts the operation of `stack` by moving values from **`index levels`** to column names.

In [7]:
df_stacked = df.stack()
df_stacked

Texas    Apple      12
         Orange     10
         Banana     40
Arizona  Apple       9
         Orange      7
         Banana     12
Florida  Apple       0
         Orange     14
         Banana    190
dtype: int64

In [8]:
df_stacked.unstack()

Unnamed: 0,Apple,Orange,Banana
Texas,12,10,40
Arizona,9,7,12
Florida,0,14,190


### Transposing a DataFrame with `stack` and `unstack`
A DataFrame can be easily transposed with the **`T`** attribute but can also be achieved by cleverly using **`stack`** and then **`unstack`**. 

The **`unstack`** method defaults to unstacking the inner most(right most) level of the index. Index levels are numbered beginning at 0 from left to right. The **`level`** parameter is defaulted to **`-1`** meaning the right most level. We can change this parameter to choose the exact level we want to unstack. You may use a list to unstack more than one level.

In [5]:
# View original df
df

Unnamed: 0,Apple,Orange,Banana
Texas,12,10,40
Arizona,9,7,12
Florida,0,14,190


In [6]:
# Transpose the original dataframe by unstacking
df_stacked.unstack(level=0)

Unnamed: 0,Texas,Arizona,Florida
Apple,12,9,0
Orange,10,7,14
Banana,40,12,190


In [7]:
# done more efficiently with .T
df.T

Unnamed: 0,Texas,Arizona,Florida
Apple,12,9,0
Orange,10,7,14
Banana,40,12,190
