### DS102 | In Class Practice Week 2C - Categorical Analysis
<hr>
## Learning Objectives
At the end of the lesson, you will be able to:

### In Class Content
At the end of the lesson, you will be able to:

- use a `GroupBy` object to aggregate data, retrieving the `size()` of aggregated records

- create a `GroupBy` object, aggregating two columns

- use `rename()` to change the name of columns

- perform data cleaning using `replace()`

- use `apply()` on `numpy` functions like `numpy.int` and `numpy.float`

- use `drop` to remove one column from a `DataFrame`

### Self Study Content
At the end of your self-study, you will be able to:

- create a pivot table using  `pandas.pivot_table` to see aggregates in rows and columns

- update column names using `pandas.Index`

### Datasets Required for this In Class
1. `travel-expenses.csv`

#### import `pandas` and `numpy`

In [2]:
# Import the libraries

import pandas as pd
import numpy as np

#### read from CSV to `df`
Read the dataset `travel-expenses.csv` into a `df`.

In [3]:
# Exercise: Read the dataset into a df
df = pd.read_csv('travel-expenses.csv')

Use `df.info()` and `df.head()` to get key properties of the `df`.

In [4]:
# Exercise: Use df.info() to get key properties of the df
#
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 147 entries, 0 to 146
Data columns (total 11 columns):
 #   Column                                                                  Non-Null Count  Dtype 
---  ------                                                                  --------------  ----- 
 0   Name                                                                    147 non-null    object
 1   Start date of trip                                                      147 non-null    object
 2   Duration of Visit                                                       147 non-null    object
 3   Destination                                                             147 non-null    object
 4   Purpose of trip                                                         147 non-null    object
 5   Mode of transport                                                       147 non-null    object
 6   Class of travel                                                         147 non-null    ob

For easier manipulation, first `rename` the columns. Use `inplace=True` so that this does not create another copy of the `df`. `columns` takes a dictionary where the key is the existing column name and the target is the new column name.

In [5]:
# Rename the columns accordingly. 
display(df.head(1))
df.rename(inplace=True, columns={
 'Accomodation/Meals': 'Accommodation and Meals', 
 'Other (including hospitality given)' : 'Other',
 'Total cost, including all visas, accommodation, travel, meals etc. (£)' : 'Total cost',
})
# How many columns are being renamed?
# (Type your answer here) 3

# Exercise: What is the datatype of the columns parameter?
# dictionary
display(df.head(1))

Unnamed: 0,Name,Start date of trip,Duration of Visit,Destination,Purpose of trip,Mode of transport,Class of travel,Accomodation/Meals,Other (including hospitality given),"Total cost, including all visas, accommodation, travel, meals etc. (£)",Total Cost of Use of Official Secure Car
0,Gareth Bayley,2018-01-15,4,Bristol,To complete SAFE+ training,Rail,Economy,Nil Return,257.19,211.4,Nil Return


Unnamed: 0,Name,Start date of trip,Duration of Visit,Destination,Purpose of trip,Mode of transport,Class of travel,Accommodation and Meals,Other,Total cost,Total Cost of Use of Official Secure Car
0,Gareth Bayley,2018-01-15,4,Bristol,To complete SAFE+ training,Rail,Economy,Nil Return,257.19,211.4,Nil Return


### Data Cleaning using `replace()`

Notice that some columns should be stored as `int`s or `float`s, but are shown to be stored as the `object` datatype. The next part will be data cleaning so that the data is ready for analysis.

Convert the `Duration of Visit` column to an `int` datatype so numerical analysis can be performed using `np.int`. Observe how this line throws a `ValueError` when applying `np.int` because some records have `Nil Return` as the value.

In [6]:
# Exercise: Find the unique values of the 'Duration of Visit' column
#
df['Duration of Visit'].unique()

array(['4', '1', '10', '2', '3', '5', '6', 'Nil Return', '17', '8', '11',
       '7'], dtype=object)

In [7]:
# Uncomment this line of code and run it. A ValueError will be raised.
# df['Duration of Visit'].apply(np.int) 

To overcome this, use `replace()` to substitute all `Nil Return` values to `0`.

In [8]:
# Make a copy of this df and store this as df_cl
#
df_cl = df.copy()
# print() the dtypes
df_cl.dtypes

Name                                        object
Start date of trip                          object
Duration of Visit                           object
Destination                                 object
Purpose of trip                             object
Mode of transport                           object
Class of travel                             object
Accommodation and Meals                     object
Other                                       object
Total cost                                  object
Total Cost of Use of Official Secure Car    object
dtype: object

In [9]:
# Use replace ('source', target) to replace values in the Series
#
df_cl['Duration of Visit'] = df_cl['Duration of Visit'].replace('Nil Return', 0)
# Now, np.int can be applied. The datatype of the column has also been updated.
df_cl['Duration of Visit'] = df_cl['Duration of Visit'].apply(np.int)

# Exercise: print the datatypes of the df using dtypes
df_cl.dtypes

Name                                        object
Start date of trip                          object
Duration of Visit                            int64
Destination                                 object
Purpose of trip                             object
Mode of transport                           object
Class of travel                             object
Accommodation and Meals                     object
Other                                       object
Total cost                                  object
Total Cost of Use of Official Secure Car    object
dtype: object

Do the same for the `Accommodation and Meals`, `Other` and `Total cost` columns.

**Q:** Find all the unique values in the following columns: `Accommodation and Meals`, `Other` and `Total cost`

In [10]:
# Exercise: Find the unique values for the 'Accommodation and Meals', 'Other' and 'Total cost' column.
# 'Accommodation and Meals' column
#
print(df_cl['Accommodation and Meals'].unique())
# 'Other' column
#
print(df_cl['Other'].unique())
# 'Total cost' column
#
print(df_cl['Total cost'].unique())

['Nil Return' '119.17' '153.29' '180' '13' '348' '0' '5' '52' '215.72'
 '4.98' '63.83' '34.41' '243.39' '149.42' '116' '40' '269' '271' '22'
 '232' '14' '20' '504' '32' '155' '91.9' '2.7' '55.94' '186.5' '1543.71'
 '564.77' '2024.47' '142.51' '129' '237.74' '824.36' '243.41' '373.83'
 '843.57' '731.93' '326.19' '583.22' '85' '6' '341' '246' '34.61' '8.92'
 '88.75' '391.44' '200' '151.14' '65' '254' '281' '186' '271.8' '249.23'
 '43' '4' 'Nill return']
['257.19' '381.21' '3591.45' '4205.03' '3983.37' '238.56' '106.8'
 '1894.66' 'Nil Return' '335' '0' '13' '30' '43' '157.91' '93.6' '384.19'
 '1601.71' '11.61' '239.79' 'Nil Return ' '50' '104' '173.54' '36'
 '207.96' '9' '108.14' '72' '40' '41.9' '56.92' '46' '92' '20'
 'Nill return']
['211.4' '500.38' '3591.45' '4205.03' '3983.37' '391.85' '106.8' '1894.66'
 '2772' '630' '4008' '780' '183' '340' '440' '637' '82' '1920' '7108'
 '477' '5810' '113' '2720' '41' '373.91' '98.6' '447.83' '1636.12' '11.61'
 '482.79' '293.44' '327' '38' '5143' '

To replace multiple values, use `replace(list, 0)`. The first parameter is now a `list` instead of a `str` as seen earlier.

In [11]:
# Perform the substitution for Accommodation and Meals
#
df_cl['Accommodation and Meals'] = df_cl['Accommodation and Meals'].replace(['Nill return', 'Nil Return'], 0)
df_cl['Accommodation and Meals'] = df_cl['Accommodation and Meals'].apply(np.float)
print(df_cl['Accommodation and Meals'].unique())

[   0.    119.17  153.29  180.     13.    348.      5.     52.    215.72
    4.98   63.83   34.41  243.39  149.42  116.     40.    269.    271.
   22.    232.     14.     20.    504.     32.    155.     91.9     2.7
   55.94  186.5  1543.71  564.77 2024.47  142.51  129.    237.74  824.36
  243.41  373.83  843.57  731.93  326.19  583.22   85.      6.    341.
  246.     34.61    8.92   88.75  391.44  200.    151.14   65.    254.
  281.    186.    271.8   249.23   43.      4.  ]


In [12]:
# Perform the substitution for 'Other'. Then convert the column to a float datatype using np.float.
#
df_cl['Other'] = df_cl['Other'].replace(['Nil Return ', 'Nill return', 'Nil Return'], 0)
df_cl['Other'] = df_cl['Other'].apply(np.float)
# print the unique values of this column after substitution
#
print(df_cl['Other'].unique())

[ 257.19  381.21 3591.45 4205.03 3983.37  238.56  106.8  1894.66    0.
  335.     13.     30.     43.    157.91   93.6   384.19 1601.71   11.61
  239.79   50.    104.    173.54   36.    207.96    9.    108.14   72.
   40.     41.9    56.92   46.     92.     20.  ]


In [13]:
# Exercise: Perform the substitution for 'Total Cost'. Then convert the column to a float datatype.
df_cl['Total cost'] = df_cl['Total cost'].replace(['Nil Return'], 0)
df_cl['Total cost'] = df_cl['Total cost'].apply(np.float)
# print the unique values of this column after substitution
#
print(df_cl['Total cost'].unique())

[ 211.4   500.38 3591.45 4205.03 3983.37  391.85  106.8  1894.66 2772.
  630.   4008.    780.    183.    340.    440.    637.     82.   1920.
 7108.    477.   5810.    113.   2720.     41.    373.91   98.6   447.83
 1636.12   11.61  482.79  293.44  327.     38.   5143.   4167.    770.
 2371.   3781.    394.   2480.    104.   3856.    692.45   32.    310.
 2955.      0.    520.33  335.5   423.18  554.05  498.15  583.7  1158.33
 6912.55  584.66 3246.03 5740.47 5857.91   10.8  7093.88  390.01 7753.43
  226.1    35.    204.    343.5  5213.5   459.     46.    982.13  263.35
 5137.52 4200.68   37.     20.     39.     66.     86.5    68.03 5558.21
  658.    243.     18.7   483.    623.     20.4   290.81   26.1    67.2
   62.2    53.2  5586.71  502.   5110.    847.   3341.    217.   6771.41
  467.41   41.9  2775.99 4081.27  231.43  187.     18.     57.     36.98
 4107.99   32.25 3275.    427.49  329.   3455.5   323.   7506.     54.
  490.    378.    925.    514.51  270.    438.    338.    231.

Now, only keep certain columns for analysis. Use `df.drop()` to remove the column name. Since we are dropping a column, use `axis=1` to specify that.

In [14]:
# Drop the 'Total Cost of Use of Official Secure Car' column
#
display(df_cl.head(1))
df_cl.drop('Total Cost of Use of Official Secure Car', axis=1, inplace=True)
display(df_cl.head(1))

Unnamed: 0,Name,Start date of trip,Duration of Visit,Destination,Purpose of trip,Mode of transport,Class of travel,Accommodation and Meals,Other,Total cost,Total Cost of Use of Official Secure Car
0,Gareth Bayley,2018-01-15,4,Bristol,To complete SAFE+ training,Rail,Economy,0.0,257.19,211.4,Nil Return


Unnamed: 0,Name,Start date of trip,Duration of Visit,Destination,Purpose of trip,Mode of transport,Class of travel,Accommodation and Meals,Other,Total cost
0,Gareth Bayley,2018-01-15,4,Bristol,To complete SAFE+ training,Rail,Economy,0.0,257.19,211.4


Observe the `df` one more time after data cleaning before doing analysis. Observe now that the datatypes have now been updated.

In [15]:
# Exercise: Show the properties of the df using info()
#
df_cl.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 147 entries, 0 to 146
Data columns (total 10 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Name                     147 non-null    object 
 1   Start date of trip       147 non-null    object 
 2   Duration of Visit        147 non-null    int64  
 3   Destination              147 non-null    object 
 4   Purpose of trip          147 non-null    object 
 5   Mode of transport        147 non-null    object 
 6   Class of travel          147 non-null    object 
 7   Accommodation and Meals  147 non-null    float64
 8   Other                    147 non-null    float64
 9   Total cost               147 non-null    float64
dtypes: float64(3), int64(1), object(6)
memory usage: 11.6+ KB


In [16]:
# Exercise: Show the first 20 records using head()
#
df_cl.head(20)

Unnamed: 0,Name,Start date of trip,Duration of Visit,Destination,Purpose of trip,Mode of transport,Class of travel,Accommodation and Meals,Other,Total cost
0,Gareth Bayley,2018-01-15,4,Bristol,To complete SAFE+ training,Rail,Economy,0.0,257.19,211.4
1,Gareth Bayley,2018-01-23,1,Berlin,Meetings with Federal Foreign Office officials...,Air,Economy,119.17,381.21,500.38
2,Gareth Bayley,2018-01-30,10,"Kabul, Islamabad, Karachi",Meetings with British Embassy and High Commiss...,Air,Economy/\nBusiness,0.0,3591.45,3591.45
3,Gareth Bayley,2018-02-20,2,Washington,Meetings with US Department of state officials...,Air,Economy/\nBusiness,0.0,4205.03,4205.03
4,Gareth Bayley,2018-02-27,3,Kabul,Attend Kabul Process II Conference,Air,Business,0.0,3983.37,3983.37
5,Gareth Bayley,2018-03-06,1,Brussels,Attend EU Asia-Oceania Working Party (COASI) C...,Rail,Economy,153.29,238.56,391.85
6,Gareth Bayley,2018-03-22,1,Cheltenham,Bilateral Meetings,Rail,Economy,0.0,106.8,106.8
7,Gareth Bayley,2018-03-27,2,Tashkent,Conference on Afghanistan Peace Process,Air,Business,0.0,1894.66,1894.66
8,Edward Hobart,2018-02-25,3,Nairobi,Overseas Security conference and estates visit,Air,Business,0.0,0.0,2772.0
9,Edward Hobart,2018-02-28,1,Paris,Visit British Embassy and Estate,Rail,Economy,180.0,0.0,630.0


### Data Aggregation with `.size()`

**Q:  How many delegates have taken $5$ or more trips?**

Use the `DataFrame.groupby()` method to first aggregate the data by the `Name` column. Then, since we are **counting** the number of trips made, we use the `size()` method to count the number of trips per delegate.

Finally, use `reset_index(name='No. of Trips')` to convert this to a `DataFrame` and changing the count column to `'No. of Trips'`.

In [17]:
# Exercise: Make a copy of df_cl and store them as df_days
#
df_days = df_cl.copy()
# Exercise: For each delegate, count the number of trips taken.
#
df_count_aggregate = df_days.groupby('Name').size()
# Then perform df.reset_index() to change the column name to 'No. of Trips'
#
df_count_aggregate = df_count_aggregate.reset_index(name='No. of Trips')
# Observe the df here
df_count_aggregate

Unnamed: 0,Name,No. of Trips
0,Alastair McPhail,1
1,Andrew Noble,1
2,Andrew Sanderson,2
3,Andy Murdoch,4
4,Angus Lapsley,11
5,Ben Merrick,5
6,Caroline Wilson,7
7,Charles Hay,1
8,Colin Martin-Reynolds,5
9,Edward Hobart,4


Now that we have the required dataframe, filter for all records where the `No. of Trips` is $5$ or greater. Use `DataFrame['Name'].count()` after performing the filtering to get the number of delegates.

In [18]:
# Exercise: Complete the code to get delegates with 5 or more trips
#
df_count_aggregate[df_count_aggregate['No. of Trips']>=5]['Name'].count()

14

### More Data Aggregation with `.agg()`

**Q: Which delegates have made $5$ trips or more, but have travelled a total of less than $10$ days?**

**Worked Solution**: Make a copy of the `df` and isolate two columns - `Name` and `Duration of Visit` as they are required for our analysis. Remove all records where the value for `Duration of Visit` is `'Nil Return'`. Finally, convert the column to an `int` datatype using `np.int`.

In [19]:
# Exercise: Make a copy of the df and store them in df_trips
#
df_trips = df.copy()
# Then, only keep the required columns
#
df_trips = df_trips[['Name', 'Duration of Visit']]
# Filter for all records that have 'Nil Return' in the 'Duration of Visit' column
# (Perform this step if you are using the dataset that is not cleaned.)
#
df_trips = df_trips[~(df_trips['Duration of Visit'] == 'Nil Return')]
# Convert the 'Duration of Visit' column to an int datatype
#
df_trips['Duration of Visit'] = df_trips['Duration of Visit'].apply(np.int)
df_trips.head()

Unnamed: 0,Name,Duration of Visit
0,Gareth Bayley,4
1,Gareth Bayley,1
2,Gareth Bayley,10
3,Gareth Bayley,2
4,Gareth Bayley,3


Perform the aggregation using `groupby()`. Then, use the `agg()` function to perform multiple aggregations on the same column. `agg` takes a `dict` as a parameter where the **key** is the **name of the column** and the **value** is the `list` of functions we intend to aggregate for.

In [20]:
# Use groupby() followed by agg(). Then, reset_index()
df_count_sum_aggregate = df_trips.groupby('Name').agg({
    'Duration of Visit' : ['sum', 'size']})
df_count_sum_aggregate = df_count_sum_aggregate.reset_index()
# print out the result
df_count_sum_aggregate.head()

Unnamed: 0_level_0,Name,Duration of Visit,Duration of Visit
Unnamed: 0_level_1,Unnamed: 1_level_1,sum,size
0,Andrew Sanderson,3,2
1,Andy Murdoch,12,4
2,Angus Lapsley,16,11
3,Ben Merrick,13,5
4,Caroline Wilson,16,7


Observe that the column has multiple layers of labelling / indexing. Flatten the index to just 1 layer of columns. Follow [this StackOverflow answer](https://stackoverflow.com/questions/14507794/python-pandas-how-to-flatten-a-hierarchical-index-in-columns) to find more ways to flatten a multi-indexed `df`.

In [21]:
# Flatten the index using pd.Index and assigning them to the columns
df_count_sum_aggregate.columns = pd.Index(['Name', 'Number of Days', 'Number of Trips'])
df_count_sum_aggregate.head()

Unnamed: 0,Name,Number of Days,Number of Trips
0,Andrew Sanderson,3,2
1,Andy Murdoch,12,4
2,Angus Lapsley,16,11
3,Ben Merrick,13,5
4,Caroline Wilson,16,7


Finally with the results, filter using 2 conditions: where the number of trips is 5 or more but the number of days is less than 10.

In [22]:
# Exercise: Filter using the 'Number of Trips' and 'Number of Days' columns
#
df_count_sum_aggregate[(df_count_sum_aggregate['Number of Trips'] >= 5) &
                      (df_count_sum_aggregate['Number of Days'] < 10)]

Unnamed: 0,Name,Number of Days,Number of Trips
5,Colin Martin-Reynolds,8,5
18,Karen Pierce,8,6
22,Lindsay Appleby,7,5
27,Peter Jones,8,6


### Self-Study - Drawing Pivot Tables

**Q: ** Create a pivot table where:
1. each row represents one delegate
2. each column represents trip duration in days 
3. each value in the cell is the number of trips with that duration.

Perform this for all trips where the `Duration of Visit` is $5$ **days or less**.

First, create a copy of `df_trips` and store them in `df_pt_raw`. Filter for all trips with $5$ days or less.

In [23]:
# Create a copy of df_trips
#
df_pt_raw = df_trips.copy()
# Filter for all trips with 5 days or less
#
df_pt_raw = df_pt_raw[df_pt_raw['Duration of Visit'] <=5]
df_pt_raw.head()

Unnamed: 0,Name,Duration of Visit
0,Gareth Bayley,4
1,Gareth Bayley,1
3,Gareth Bayley,2
4,Gareth Bayley,3
5,Gareth Bayley,1


Then, create the table using the `pd.pivot_table()` function. Specify the `index`, `columns`, `values` and `aggfunc` accordingly.

In [26]:
# Create the pivot table using pd.pivot_table() 
df_pt = pd.pivot_table(df_pt_raw, index=['Name'], columns=['Duration of Visit'],
              values='Duration of Visit', aggfunc=len)
# Add the Name as 1 column in the df
df_pt['Name'] = df_pt.index
# # reset_index
df_pt.reset_index(drop=True, inplace=True)
df_pt

Duration of Visit,1,2,3,4,5,Name
0,1.0,1.0,,,,Andrew Sanderson
1,1.0,1.0,1.0,,,Andy Murdoch
2,8.0,1.0,2.0,,,Angus Lapsley
3,,3.0,1.0,1.0,,Ben Merrick
4,1.0,3.0,3.0,,,Caroline Wilson
5,2.0,3.0,,,,Colin Martin-Reynolds
6,2.0,,1.0,1.0,,Edward Hobart
7,3.0,2.0,1.0,1.0,,Gareth Bayley
8,1.0,1.0,,1.0,,Helen Bower-Easton
9,,1.0,,,1.0,Hugh Elliott


Change the datatypes and order of the columns and finally convert every value in the column to an integer.

In [27]:
# Display the columns. Note that the values for the numeric columns are of type int, not string.
print(df_pt.columns)

Index([1, 2, 3, 4, 5, 'Name'], dtype='object', name='Duration of Visit')


In [30]:
# Change the datatype of the columns.
df_pt.columns = pd.Index(['1', '2', '3', '4', '5', 'Name'])
# Change the order of the columns
df_pt = df_pt[['Name', '1', '2', '3', '4', '5']]
print(df_pt.columns)

Index(['Name', '1', '2', '3', '4', '5'], dtype='object')


In [31]:
df_pt

Unnamed: 0,Name,1,2,3,4,5
0,Andrew Sanderson,1.0,1.0,,,
1,Andy Murdoch,1.0,1.0,1.0,,
2,Angus Lapsley,8.0,1.0,2.0,,
3,Ben Merrick,,3.0,1.0,1.0,
4,Caroline Wilson,1.0,3.0,3.0,,
5,Colin Martin-Reynolds,2.0,3.0,,,
6,Edward Hobart,2.0,,1.0,1.0,
7,Gareth Bayley,3.0,2.0,1.0,1.0,
8,Helen Bower-Easton,1.0,1.0,,1.0,
9,Hugh Elliott,,1.0,,,1.0


In [32]:
# Substitute all values with NaN to 0
df_pt[['1', '2', '3', '4', '5',]] = df_pt[['1', '2', '3', '4', '5',]].fillna(0)
# Convert all the columns to integers
for c in ['1', '2', '3', '4', '5',]:
    df_pt[c] = df_pt[c].apply(np.int)
df_pt

Unnamed: 0,Name,1,2,3,4,5
0,Andrew Sanderson,1,1,0,0,0
1,Andy Murdoch,1,1,1,0,0
2,Angus Lapsley,8,1,2,0,0
3,Ben Merrick,0,3,1,1,0
4,Caroline Wilson,1,3,3,0,0
5,Colin Martin-Reynolds,2,3,0,0,0
6,Edward Hobart,2,0,1,1,0
7,Gareth Bayley,3,2,1,1,0
8,Helen Bower-Easton,1,1,0,1,0
9,Hugh Elliott,0,1,0,0,1


**Exercise**
Using `pokemon.csv`, Create a pivot table where
1. each row represents the type of Pokémon
2. one column representing all Legendary Pokémon and one column representing Non Legendary Pokémon
3. each value in the cell is the count(number) of Pokémon of Legendary status and Non Legendary status, for each type

Perform this for only Pokémon with ID is $151$ or less. Use `aggfunc=np.sum` for this aggregation.

In [33]:
# Read from CSV file
p_df = pd.read_csv('pokemon.csv')

# Isolate the Pokemon_ID, Type and Legendary column
p_type_df = p_df.copy()
p_type_df = p_type_df[['Pokemon_ID', 'Type', 'Legendary']]

# Filter for all Pokemon_ID <= 151
p_type_df = p_type_df[p_type_df['Pokemon_ID'] <= 151]

# Perform a groupby and count the number of Pokemon in each type, for each Legendary status
p_type_df_agg = p_type_df.groupby(['Type', 'Legendary']).size().reset_index(name='count')

# Rename the columns
p_type_df_agg.rename(columns={'count' : 'Number of Pokemon'}, inplace=True)

# Create the pivot table where the row is the Type and the column is the Legendary status
p_type_df_pt = pd.pivot_table(p_type_df_agg, index=['Type'], columns=['Legendary'],
              values='Number of Pokemon', aggfunc=np.sum)

# Fill all missing cells to 0
p_type_df_pt.fillna(0, inplace=True)

# Convert the cells datatype to int
p_type_df_pt = p_type_df_pt.astype(int)

# Update the column names
p_type_df_pt.columns = ['Not Legendary', 'Legendary']

# Add one more column and reset the column
p_type_df_pt.reset_index(inplace=True)
p_type_df_pt = p_type_df_pt[['Type', 'Not Legendary', 'Legendary']]

# # Show the df
p_type_df_pt

Unnamed: 0,Type,Not Legendary,Legendary
0,Bug,14,0
1,Dragon,3,0
2,Electric,8,1
3,Fairy,2,0
4,Fighting,7,0
5,Fire,13,1
6,Ghost,4,0
7,Grass,13,0
8,Ground,8,0
9,Ice,1,1


**Credits**
- [data.gov.uk](https://www.europeandataportal.eu/data/en/dataset/travel-undertaken-by-fco-senior-staff) for the Foreign and Commonwealth Office's travel expenses dataset

- [Pokemon with stats, Kaggle](https://www.kaggle.com/abcsds/pokemon) for the Pokémon dataset
<hr>
`HWA-DS102-INCLASS-2C-201903`