# The Task

#### Your task is two-fold:

I. Clean the files and combine them into one final DataFrame.

#### This dataframe should have the following columns:
- Hero (Just the name of the Hero)
- Publisher
- Gender
- Eye color
- Race
- Hair color
- Height (numeric)
- Skin color
- Alignment
- Weight (numeric)
- Plus, one-hot-encoded columns for every power that appears in the dataset. E.g.:
- Agility
- Flight
- Superspeed
etc.
- Hint: There is a space in "100 kg" or "52.5 cm"



#### II. Use your combined DataFrame to answer the following questions.

1. Compare the average weight of super powers who have Super Speed to those who do not.
2. What is the average height of heroes for each publisher?


# Import Libraries

In [153]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import os, json, math, time
import tmdbsimple as tmdb
from tqdm.notebook import tqdm_notebook

from matplotlib.ticker import StrMethodFormatter
sns.set_style('whitegrid')

# Load Data

In [154]:
super_info = pd.read_csv("Data/superhero_info - superhero_info.csv")
super_info.head()

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}"
2,Abin Sur|DC Comics,Male,Ungaran,good,No Hair,blue,red,"{'Height': '185.0 cm', 'Weight': '90.0 kg'}"
3,Abomination|Marvel Comics,Male,Human / Radiation,bad,No Hair,green,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
4,Absorbing Man|Marvel Comics,Male,Human,bad,No Hair,blue,Unknown,"{'Height': '193.0 cm', 'Weight': '122.0 kg'}"


In [155]:
strength = pd.read_csv("Data/superhero_powers - superhero_powers.csv")
strength.head()

Unnamed: 0,hero_names,Powers
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed"
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super..."
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abin Sur,Lantern Power Ring
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt..."


# Clean Data

In [156]:
super_info.info

<bound method DataFrame.info of                     Hero|Publisher  Gender               Race Alignment  \
0             A-Bomb|Marvel Comics    Male              Human      good   
1     Abe Sapien|Dark Horse Comics    Male      Icthyo Sapien      good   
2               Abin Sur|DC Comics    Male            Ungaran      good   
3        Abomination|Marvel Comics    Male  Human / Radiation       bad   
4      Absorbing Man|Marvel Comics    Male              Human       bad   
..                             ...     ...                ...       ...   
458     Yellowjacket|Marvel Comics    Male              Human      good   
459  Yellowjacket II|Marvel Comics  Female              Human      good   
460              Yoda|George Lucas    Male     Yoda's species      good   
461              Zatanna|DC Comics  Female              Human      good   
462                 Zoom|DC Comics    Male            Unknown       bad   

           Hair color Eye color Skin color  \
0             No Hair

In [157]:
strength.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 667 entries, 0 to 666
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   hero_names  667 non-null    object
 1   Powers      667 non-null    object
dtypes: object(2)
memory usage: 10.5+ KB


# Replacing multiple characters at once within a string column

In [158]:
# Make a list of all characters to replace
to_replace = ['(',')']
# run a loop to replace all of the characters in the list at once
for char in to_replace:
    super_info['Measurements'] = super_info['Measurements'].str.replace(char,'',regex=False)
    
super_info['Measurements'].head()

0    {'Height': '203.0 cm', 'Weight': '441.0 kg'}
1     {'Height': '191.0 cm', 'Weight': '65.0 kg'}
2     {'Height': '185.0 cm', 'Weight': '90.0 kg'}
3    {'Height': '203.0 cm', 'Weight': '441.0 kg'}
4    {'Height': '193.0 cm', 'Weight': '122.0 kg'}
Name: Measurements, dtype: object

In [159]:
Measurements = Measurements.replace("'",'"')
Measurements

'{"Height": "203.0 cm", "Weight": "441.0 kg"}'

In [160]:
## now we can use json.loads
fixed_measurements = json.loads(Measurements)
print(type(fixed_measurements))
fixed_measurements

<class 'dict'>


{'Height': '203.0 cm', 'Weight': '441.0 kg'}

### Applying this to the entire column.

In [161]:
## use .str.replace to replace all single quotes
super_info['Measurements'] = super_info['Measurements'].str.replace("'",'"')
## Apply the json.loads to the full column
super_info['Measurements'] = super_info['Measurements'].apply(json.loads)
super_info['Measurements'].head()

0    {'Height': '203.0 cm', 'Weight': '441.0 kg'}
1     {'Height': '191.0 cm', 'Weight': '65.0 kg'}
2     {'Height': '185.0 cm', 'Weight': '90.0 kg'}
3    {'Height': '203.0 cm', 'Weight': '441.0 kg'}
4    {'Height': '193.0 cm', 'Weight': '122.0 kg'}
Name: Measurements, dtype: object

In [162]:
## check a single value after transformation
test = super_info.loc[0, 'Measurements']
print(type(test))
test

<class 'dict'>


{'Height': '203.0 cm', 'Weight': '441.0 kg'}

### Unpack a column of dictionaries into separate columns

In [163]:
height_weight = super_info['Measurements'].apply(pd.Series)
height_weight

Unnamed: 0,Height,Weight
0,203.0 cm,441.0 kg
1,191.0 cm,65.0 kg
2,185.0 cm,90.0 kg
3,203.0 cm,441.0 kg
4,193.0 cm,122.0 kg
...,...,...
458,183.0 cm,83.0 kg
459,165.0 cm,52.0 kg
460,66.0 cm,17.0 kg
461,170.0 cm,57.0 kg


In [164]:
# concat long_lat with original dataframe
super_info = pd.concat((super_info, height_weight), axis = 1)
super_info.head(2)

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements,Height,Weight
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",203.0 cm,441.0 kg
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}",191.0 cm,65.0 kg


### Drop column

In [165]:
super_info = super_info.drop(columns=['Measurements'])
super_info

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Height,Weight
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,203.0 cm,441.0 kg
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,191.0 cm,65.0 kg
2,Abin Sur|DC Comics,Male,Ungaran,good,No Hair,blue,red,185.0 cm,90.0 kg
3,Abomination|Marvel Comics,Male,Human / Radiation,bad,No Hair,green,Unknown,203.0 cm,441.0 kg
4,Absorbing Man|Marvel Comics,Male,Human,bad,No Hair,blue,Unknown,193.0 cm,122.0 kg
...,...,...,...,...,...,...,...,...,...
458,Yellowjacket|Marvel Comics,Male,Human,good,Blond,blue,Unknown,183.0 cm,83.0 kg
459,Yellowjacket II|Marvel Comics,Female,Human,good,Strawberry Blond,blue,Unknown,165.0 cm,52.0 kg
460,Yoda|George Lucas,Male,Yoda's species,good,White,brown,green,66.0 cm,17.0 kg
461,Zatanna|DC Comics,Female,Human,good,Black,blue,Unknown,170.0 cm,57.0 kg


### Fix "Hero|Publisher" column