## Exercise 6.4
### 1. Setting & sorting a MultiIndex

The `sales` DataFrame you have been working with has been extended to now include State information as well. 

#### Instructions (2 points)

* Create a MultiIndex by setting the index to be the columns `['state', 'month']`.
* Sort the MultiIndex using the `.sort_index()` method.

In [1]:
import pandas as pd
sales = pd.read_csv('https://github.com/huangpen77/BUDT704/raw/main/Chapter07/sales2.csv')
sales.head()

Unnamed: 0,state,month,eggs,salt,spam
0,CA,1,47,12.0,17
1,CA,2,110,50.0,31
2,NY,1,221,89.0,72
3,NY,2,77,87.0,20
4,TX,1,132,,52


In [2]:
# Set the index to be the columns ['state', 'month']: sales
sales = sales.set_index(['state', 'month'])

# Sort the MultiIndex: sales
sales = sales.sort_index()

# Print the sales DataFrame
sales

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,221,89.0,72
NY,2,77,87.0,20
TX,1,132,,52
TX,2,205,60.0,55


### 2. Extracting data with a MultiIndex

Extracting elements from the outermost level of a `MultiIndex` is just like in the case of a single-level `Index`. You can use the `.loc[]` accessor.

#### Instructions (2 points)

* Print the rows in `sales` with state 'CA' and 'TX'. Note how New York is excluded.
* Print the rows in `sales` with state 'CA':'TX'. Note how New York is included.


In [5]:
# Print the rows in `sales` with state 'CA' and 'TX'.
print(sales.loc[['CA', 'TX']], '\n')
# Print the rows in `sales` with state 'CA':'TX'.
print(sales.loc['CA':'TX'])

             eggs  salt  spam
state month                  
CA    1        47  12.0    17
      2       110  50.0    31
TX    1       132   NaN    52
      2       205  60.0    55 

             eggs  salt  spam
state month                  
CA    1        47  12.0    17
      2       110  50.0    31
NY    1       221  89.0    72
      2        77  87.0    20
TX    1       132   NaN    52
      2       205  60.0    55


### 3: Indexing multiple levels of a MultiIndex

Looking up indexed data is fast and efficient. And you have already seen that lookups based on the outermost level of a `MultiIndex` work just like lookups on DataFrames that have a single-level `Index`.

Looking up data based on inner levels of a `MultiIndex` can be a bit trickier. In this exercise, you will use your `sales` DataFrame to do some increasingly complex lookups.

The trickiest of all these lookups are when you want to access some inner levels of the index. In this case, you need to use `slice(None)` in the slicing parameter for the outermost dimension(s) instead of the usual `:`, or use `pd.IndexSlice`. You can refer to the [pandas documentation](#http://pandas.pydata.org/pandas-docs/stable/advanced.html) for more details. 


#### Instructions (3 points)

* Look up data for the New York column (`'NY'`) in month `1`.
* Look up data for the California and Texas columns (`'CA'`, `'TX'`) in month `2`.
* Look up data for all states in month `2`. Use `(slice(None), 2)` to extract all rows in month `2`

In [7]:
# Look up data for NY in month 1: NY_month1
NY_month1 = sales.loc[('NY',1)]
print(NY_month1)

# Look up data for CA and TX in month 2: CA_TX_month2
CA_TX_month2 = sales.loc[(['CA', 'TX'], 2), :]
print('\n', CA_TX_month2)

# Look up data for all states in month 2: all_month2
all_month2 = sales.loc[(slice(None), 2), :]
print('\n', all_month2)

eggs    221.0
salt     89.0
spam     72.0
Name: (NY, 1), dtype: float64

              eggs  salt  spam
state month                  
CA    2       110  50.0    31
TX    2       205  60.0    55

              eggs  salt  spam
state month                  
CA    2       110  50.0    31
NY    2        77  87.0    20
TX    2       205  60.0    55


## Exercise 6.5
### 1. Pivoting a single variable

In this exercise, your job is to pivot `users` so that the focus is on `'visitors'`, with the columns indexed by `'city'` and the rows indexed by `'weekday'`.

Inspect `users` and make a note of which variable you want to use to index the rows (`'weekday'`), which variable you want to use to index the columns (`'city'`), and which variable will populate the values in the cells (`'visitors'`). Try to visualize what the result should be.

#### Instructions (2 points)

* Pivot the `users` DataFrame with the rows indexed by `'weekday'`, the columns indexed by `'city'`, and the values populated with `'visitors'`.

In [8]:
import pandas as pd
users = pd.read_csv('https://github.com/huangpen77/BUDT704/raw/main/Chapter07/users.csv', index_col=0)
users

Unnamed: 0,weekday,city,visitors,staff
0,Sun,Austin,139,7
1,Sun,Dallas,237,12
2,Mon,Austin,326,3
3,Mon,Dallas,456,5


In [10]:
# Pivot the users DataFrame: visitors_pivot
visitors_pivot = users.pivot(index='weekday',
                             columns='city',
                             values='visitors')
visitors_pivot

city,Austin,Dallas
weekday,Unnamed: 1_level_1,Unnamed: 2_level_1
Mon,326,456
Sun,139,237


Notice how in the pivoted DataFrame, the index is labeled `'weekday'`, the columns are labeled `'city'`, and the values are populated by the number of `visitors`.

### 2. Pivoting all variables

If you do not select any particular variables, all of them will be pivoted. In this case - with the `users` DataFrame - both `'visitors'` and `'staff'` will be pivoted, creating hierarchical column labels.

You will explore this for yourself now in this exercise.

#### Instructions (2 points)
* Pivot the `users` DataFrame with both `'staff'` and `'visitors'` pivoted - that is, all the variables. This will happen automatically if you do not specify an argument for the `values` parameter of `.pivot()`.

In [11]:
# Pivot users pivoted by both staff and visitors: pivot
pivot = users.pivot(index='weekday', columns='city')
pivot

Unnamed: 0_level_0,weekday,weekday,weekday,weekday,city,city,city,city
visitors,139,237,326,456,139,237,326,456
staff,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
3,,,Mon,,,,Austin,
5,,,,Mon,,,,Dallas
7,Sun,,,,Austin,,,
12,,Sun,,,,Dallas,,


### 3. Setting up a pivot table

 In this exercise, you will use the `.pivot_table()` method in combination with an aggregation function.

#### Instructions (2 points)

* create a pivot table, `city_max`, to index the rows of `users` dataframe by `'city'`. This correspond to the `index` parameter of `.pivot_table()`. Use `'staff'` and `'visitors'` columns as the argument for `values` parameter of `.pivot_table()`. In addition, specify the `aggfunc` as `'max'`.

In [13]:
# Create the DataFrame with the appropriate pivot table: city_max
city_max = users.pivot_table(index='city', values=['staff','visitors'], aggfunc='max')
city_max

Unnamed: 0_level_0,staff,visitors
city,Unnamed: 1_level_1,Unnamed: 2_level_1
Austin,7,326
Dallas,12,456


## Exercise 6.6
### 1: Stacking & unstacking I

You are now going to practice stacking and unstacking DataFrames. The `users` DataFrame you have been working with in this chapter has been pre-loaded for you, this time with a MultiIndex. Pay attention to the index, and notice that the index levels are `['city', 'weekday']`. So `'weekday'` - the second entry - has position 1. This position is what corresponds to the `level` parameter in `.stack()` and `.unstack()` calls. Alternatively, you can specify `weekday` as the level instead of its position.

Your job in this exercise is to unstack `users` by `'weekday'`. You will then use `.stack()` on the unstacked DataFrame to see if you get back the original layout of `users`.

#### Instructions (2 points)

* Define a DataFrame `byweekday` with the `'weekday'` level of `users` unstacked.
* Stack `byweekday` by `'weekday'` and print it to check if you get the same layout as the original `users` DataFrame.

In [14]:
import pandas as pd
users = pd.read_csv('https://github.com/huangpen77/BUDT704/raw/main/Chapter07/users.csv', index_col=0)
users.set_index(['city', 'weekday'], inplace=True)
users = users.sort_index()
users

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,staff
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Austin,Sun,139,7
Dallas,Mon,456,5
Dallas,Sun,237,12


In [15]:
# Unstack users by 'weekday': byweekday
byweekday = users.unstack(level='weekday')
byweekday

Unnamed: 0_level_0,visitors,visitors,staff,staff
weekday,Mon,Sun,Mon,Sun
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Austin,326,139,3,7
Dallas,456,237,5,12


In [19]:
# Stack byweekday by 'weekday' and print it
byweekday.stack(level='weekday')

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,staff
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Austin,Sun,139,7
Dallas,Mon,456,5
Dallas,Sun,237,12


### 2: Stacking & unstacking II

You are now going to continue working with the `users` DataFrame. As always, first explore it to see the layout and note the index.

Your job in this exercise is to unstack and then stack the `'city'` level, as you did previously for `'weekday'`. Note that you won't get the same DataFrame, because the MultiIndex of this dataframe has changed.

#### Instructions (3 points)

* Define a DataFrame `bycity` with the `'city'` level of `users` unstacked.
* Stack `bycity` by `'city'` and store it in `bycity_stack`. Print it to check if you get the same layout as the original `users` DataFrame.
* Print out the index of `bycity_stack`. Also print out the index of `users` to compare them.

In [20]:
print(users)

                visitors  staff
city   weekday                 
Austin Mon           326      3
       Sun           139      7
Dallas Mon           456      5
       Sun           237     12


In [21]:
# Unstack users by 'city': bycity
bycity = users.unstack(level='city')
bycity

Unnamed: 0_level_0,visitors,visitors,staff,staff
city,Austin,Dallas,Austin,Dallas
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Mon,326,456,3,5
Sun,139,237,7,12


In [22]:
# Stack bycity by 'city' and print it
bycity_stack = bycity.stack(level='city')
bycity_stack

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,staff
weekday,city,Unnamed: 2_level_1,Unnamed: 3_level_1
Mon,Austin,326,3
Mon,Dallas,456,5
Sun,Austin,139,7
Sun,Dallas,237,12


In [25]:
print(bycity_stack.sort_index())
print('\n')
print(users.sort_index())

                visitors  staff
weekday city                   
Mon     Austin       326      3
        Dallas       456      5
Sun     Austin       139      7
        Dallas       237     12


                visitors  staff
city   weekday                 
Austin Mon           326      3
       Sun           139      7
Dallas Mon           456      5
       Sun           237     12


### 3: Restoring the index order

Continuing from the previous exercise, you will now use `.swaplevel(0, 1)` to flip the index levels. Note they won't be sorted. To sort them, you will have to follow up with a `.sort_index()`. You will then obtain the original DataFrame. Note that an unsorted index leads to slicing failures.

#### Instructions （2 points)

* Swap the levels of the index of `bycity_stack`.
* Sort the index of `bycity_stack`.
* Assert that `bycity_stack` equals `users`. This has been done for you.

In [26]:
# Swap the levels of the index of bycity_stack: bycity_stack
bycity_stack = bycity_stack.swaplevel(0,1)
bycity_stack

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,staff
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Dallas,Mon,456,5
Austin,Sun,139,7
Dallas,Sun,237,12


In [27]:
# Sort the index of bycity_stack: bycity_stack
bycity_stack = bycity_stack.sort_index()
bycity_stack

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,staff
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Austin,Sun,139,7
Dallas,Mon,456,5
Dallas,Sun,237,12


In [28]:
# Verify that the new DataFrame is equal to the original
bycity_stack.equals(users)

True