In [1]:
import numpy as np
import pandas as pd

# Tidy the Data

* One row per observation, one column per value
* Use `melt` and `pivot` to transform your data into tidy data.
* Clean and/or parse individual values from columns as needed

#  From long to wide (pivot) and wide to long (melt)

As we discussed regarding "tidy data", there are two forms:
* wide - where each attribute gets its own column
* long - where each $(value, attribute)$ pair gets its own row

## create fake data in "long" format

In [2]:
#  This example creates a long-form dataset with some built-in pandas fu.  
#  You do not need to know how this works for this class.
#
#  from:  https://pandas.pydata.org/pandas-docs/stable/reshaping.html
import pandas.util.testing as tm; tm.N = 3
def unpivot(frame):
    N, K = frame.shape
    data = {'value' : frame.values.ravel('F'),
            'variable' : np.asarray(frame.columns).repeat(N),
            'date' : np.tile(np.asarray(frame.index), K)}
    return pd.DataFrame(data, columns=['date', 'variable', 'value'])
long_df = unpivot(tm.makeTimeDataFrame())

In [3]:
long_df

Unnamed: 0,date,variable,value
0,2000-01-03,A,-0.43597
1,2000-01-04,A,1.299575
2,2000-01-05,A,-1.213263
3,2000-01-03,B,0.308231
4,2000-01-04,B,-0.077292
5,2000-01-05,B,-0.643754
6,2000-01-03,C,-1.941435
7,2000-01-04,C,0.021019
8,2000-01-05,C,0.254246
9,2000-01-03,D,-2.232026


## long to wide using `pivot`

In [4]:
wide_df = long_df.pivot(index="date", columns="variable", values="value")
wide_df

variable,A,B,C,D
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2000-01-03,-0.43597,0.308231,-1.941435,-2.232026
2000-01-04,1.299575,-0.077292,0.021019,-2.13317
2000-01-05,-1.213263,-0.643754,0.254246,0.721716


## wide to long using `melt`

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.melt.html

https://www.geeksforgeeks.org/python-pandas-melt/

another example: 
https://dfrieds.com/data-analysis/melt-unpivot-python-pandas

and https://deparkes.co.uk/2016/10/28/reshape-pandas-data-with-melt/

In [5]:
wide_df.reset_index().melt(id_vars=['date'])

Unnamed: 0,date,variable,value
0,2000-01-03,A,-0.43597
1,2000-01-04,A,1.299575
2,2000-01-05,A,-1.213263
3,2000-01-03,B,0.308231
4,2000-01-04,B,-0.077292
5,2000-01-05,B,-0.643754
6,2000-01-03,C,-1.941435
7,2000-01-04,C,0.021019
8,2000-01-05,C,0.254246
9,2000-01-03,D,-2.232026
