<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Import-pandas-and-load-the-stacked-and-melted-NLS-data" data-toc-modified-id="Import-pandas-and-load-the-stacked-and-melted-NLS-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Import pandas and load the stacked and melted NLS data</a></span></li><li><span><a href="#Stack-the-data-again" data-toc-modified-id="Stack-the-data-again-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Stack the data again</a></span></li><li><span><a href="#Melt-the-data-again" data-toc-modified-id="Melt-the-data-again-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Melt the data again</a></span></li><li><span><a href="#Use-unstack-to-convert-the-stacked-data-from-long-to-wide" data-toc-modified-id="Use-unstack-to-convert-the-stacked-data-from-long-to-wide-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Use unstack to convert the stacked data from long to wide</a></span></li><li><span><a href="#Use-pivot-to-convert-the-melted-data-from-long-to-wide" data-toc-modified-id="Use-pivot-to-convert-the-melted-data-from-long-to-wide-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Use pivot to convert the melted data from long to wide</a></span></li></ul></div>

# Import pandas and load the stacked and melted NLS data

In [14]:
import pandas as pd

In [15]:
# pd.set_option('display.width', 200)
# pd.set_option('display.max_columns', 30)
# pd.set_option('display.max_rows', 200)
# pd.options.display.float_format = '{:,.0f}'.format

In [16]:
import watermark
%load_ext watermark

%watermark -n -i -iv

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
watermark: 2.1.0
pandas   : 1.2.1
json     : 2.0.9



In [17]:
nls97 = pd.read_csv('data/nls97f.csv')
nls97.set_index(['originalid'], inplace=True)

# Stack the data again

In [18]:
weeksworkedcols = [
    'weeksworked00', 'weeksworked01', 'weeksworked02', 'weeksworked03',
    'weeksworked04'
]

In [19]:
weeksworkedstacked = nls97[weeksworkedcols].stack(dropna=False)

In [20]:
weeksworkedstacked.loc[[1, 2]]

originalid               
1           weeksworked00    53.0
            weeksworked01    52.0
            weeksworked02     NaN
            weeksworked03    42.0
            weeksworked04    52.0
2           weeksworked00    51.0
            weeksworked01    52.0
            weeksworked02    44.0
            weeksworked03    45.0
            weeksworked04    52.0
dtype: float64

# Melt the data again

In [21]:
weeksworkedmelted = nls97.reset_index().loc[:, ['originalid'] +
                                            weeksworkedcols].melt(
                                                id_vars=['originalid'],
                                                value_vars=weeksworkedcols,
                                                var_name='year',
                                                value_name='weeksworked')

In [22]:
weeksworkedmelted.loc[weeksworkedmelted['originalid'].isin(
    [1, 2])].sort_values(['originalid', 'year'])

Unnamed: 0,originalid,year,weeksworked
377,1,weeksworked00,53.0
9361,1,weeksworked01,52.0
18345,1,weeksworked02,
27329,1,weeksworked03,42.0
36313,1,weeksworked04,52.0
8980,2,weeksworked00,51.0
17964,2,weeksworked01,52.0
26948,2,weeksworked02,44.0
35932,2,weeksworked03,45.0
44916,2,weeksworked04,52.0


# Use unstack to convert the stacked data from long to wide

In [23]:
weeksworked = weeksworkedstacked.unstack()

In [24]:
weeksworked.loc[[1, 2]]

Unnamed: 0_level_0,weeksworked00,weeksworked01,weeksworked02,weeksworked03,weeksworked04
originalid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,53.0,52.0,,42.0,52.0
2,51.0,52.0,44.0,45.0,52.0


# Use pivot to convert the melted data from long to wide

The pivot function needs for us to indicate the index column (originalid), the
column whose values will be appended to the column names (year), and the name of
the columns with the values to be unmelted (weeksworked). Pivot will return multilevel
column names. We fix that by pulling from the second level with `[col[1] for col in weeksworked.columns[1:]]`.

In [25]:
weeksworked = weeksworkedmelted.pivot(index='originalid',
                                      columns='year',
                                      values=['weeksworked']).reset_index()

In [26]:
weeksworked.head(2)

Unnamed: 0_level_0,originalid,weeksworked,weeksworked,weeksworked,weeksworked,weeksworked
year,Unnamed: 1_level_1,weeksworked00,weeksworked01,weeksworked02,weeksworked03,weeksworked04
0,1,53.0,52.0,,42.0,52.0
1,2,51.0,52.0,44.0,45.0,52.0


In [27]:
weeksworked.columns = ['originalid'
                       ] + [col[1] for col in weeksworked.columns[1:]]

In [28]:
weeksworked.loc[weeksworked.originalid.isin([1, 2])].T

Unnamed: 0,0,1
originalid,1.0,2.0
weeksworked00,53.0,51.0
weeksworked01,52.0,52.0
weeksworked02,,44.0
weeksworked03,42.0,45.0
weeksworked04,52.0,52.0
