# Applying Advanced Transformations (Core)

## Task:

- Your task is two-fold:

1. Clean the files and combine them into one final DataFrame.

      - This dataframe should have the following columns:
           - Hero (Just the name of the Hero)
           - Publisher
           - Gender
           - Eye color
           - Race
           - Hair color
           - Height (numeric)
           - Skin color
           - Alignment
           - Weight (numeric)
           - Plus, one-hot-encoded columns for every power that appears in the dataset. E.g.:
                - Agility
                - Flight
                - Superspeed
                - etc.

 Hint: There is a space in "100 kg" or "52.5 cm"

    
2. Use your combined DataFrame to answer the following questions.

    - Compare the average weight of super powers who have Super Speed to those who do not.
    - What is the average height of heroes for each publisher?

## Imports

In [1]:
import numpy as np
import pandas as pd
import json

In [2]:
info = pd.read_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vS1ZstYLwFgwhZnqDsPjtnlHYhJp_cmW55J8JD5mym0seRsaem3px7QBtuFF0LiI7z1PLCkVKAkdO7J/pub?output=csv')
powers = pd.read_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vSzdWOBaXOoz52vPmCFV5idNlDBohLY1Lsbc1IfZIZQ7cV_aNB2wYBfhF49uE1TaO1B5MQCGWiNrFfd/pub?output=csv')

In [3]:
info.head()


Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}"
2,Abin Sur|DC Comics,Male,Ungaran,good,No Hair,blue,red,"{'Height': '185.0 cm', 'Weight': '90.0 kg'}"
3,Abomination|Marvel Comics,Male,Human / Radiation,bad,No Hair,green,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
4,Absorbing Man|Marvel Comics,Male,Human,bad,No Hair,blue,Unknown,"{'Height': '193.0 cm', 'Weight': '122.0 kg'}"


In [4]:
powers.head()

Unnamed: 0,hero_names,Powers
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed"
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super..."
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abin Sur,Lantern Power Ring
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt..."


In [5]:
print(info.info())
print(powers.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 463 entries, 0 to 462
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Hero|Publisher  463 non-null    object
 1   Gender          463 non-null    object
 2   Race            463 non-null    object
 3   Alignment       463 non-null    object
 4   Hair color      463 non-null    object
 5   Eye color       463 non-null    object
 6   Skin color      463 non-null    object
 7   Measurements    463 non-null    object
dtypes: object(8)
memory usage: 29.1+ KB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 667 entries, 0 to 666
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   hero_names  667 non-null    object
 1   Powers      667 non-null    object
dtypes: object(2)
memory usage: 10.5+ KB
None


In [6]:
info['Hero|Publisher'].head(3)

0            A-Bomb|Marvel Comics
1    Abe Sapien|Dark Horse Comics
2              Abin Sur|DC Comics
Name: Hero|Publisher, dtype: object

In [7]:
info['Hero|Publisher'].str.split('|', expand=True)

Unnamed: 0,0,1
0,A-Bomb,Marvel Comics
1,Abe Sapien,Dark Horse Comics
2,Abin Sur,DC Comics
3,Abomination,Marvel Comics
4,Absorbing Man,Marvel Comics
...,...,...
458,Yellowjacket,Marvel Comics
459,Yellowjacket II,Marvel Comics
460,Yoda,George Lucas
461,Zatanna,DC Comics


In [8]:
info[['Hero', 'Publisher']] = info['Hero|Publisher'].str.split('|', expand=True)
info.head(2)

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements,Hero,Publisher
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",A-Bomb,Marvel Comics
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}",Abe Sapien,Dark Horse Comics


In [9]:
info= info.drop(columns=['Hero|Publisher'])
info.head(3)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements,Hero,Publisher
0,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",A-Bomb,Marvel Comics
1,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}",Abe Sapien,Dark Horse Comics
2,Male,Ungaran,good,No Hair,blue,red,"{'Height': '185.0 cm', 'Weight': '90.0 kg'}",Abin Sur,DC Comics


In [10]:
measure= info.loc[0, "Measurements"]
measure

"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"

In [11]:
measure= measure.replace("'", '"')
measure

'{"Height": "203.0 cm", "Weight": "441.0 kg"}'

In [12]:
fixed_measure= json.loads(measure)
print(type(fixed_measure))
fixed_measure

<class 'dict'>


{'Height': '203.0 cm', 'Weight': '441.0 kg'}

In [13]:
info['Measurements']= info['Measurements'].str.replace("'",'"')

In [14]:
info['Measurements']= info['Measurements'].apply(json.loads)
info['Measurements'].head()

0    {'Height': '203.0 cm', 'Weight': '441.0 kg'}
1     {'Height': '191.0 cm', 'Weight': '65.0 kg'}
2     {'Height': '185.0 cm', 'Weight': '90.0 kg'}
3    {'Height': '203.0 cm', 'Weight': '441.0 kg'}
4    {'Height': '193.0 cm', 'Weight': '122.0 kg'}
Name: Measurements, dtype: object

In [15]:
ht_wt= info['Measurements'].apply(pd.Series)
ht_wt

Unnamed: 0,Height,Weight
0,203.0 cm,441.0 kg
1,191.0 cm,65.0 kg
2,185.0 cm,90.0 kg
3,203.0 cm,441.0 kg
4,193.0 cm,122.0 kg
...,...,...
458,183.0 cm,83.0 kg
459,165.0 cm,52.0 kg
460,66.0 cm,17.0 kg
461,170.0 cm,57.0 kg


In [16]:
info = pd.concat((info, ht_wt), axis = 1)
info.head(2)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements,Hero,Publisher,Height,Weight
0,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",A-Bomb,Marvel Comics,203.0 cm,441.0 kg
1,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}",Abe Sapien,Dark Horse Comics,191.0 cm,65.0 kg


In [17]:
info= info.drop(columns= ['Measurements'])
info.head(1)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height,Weight
0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,203.0 cm,441.0 kg


In [32]:
info['Height'].str.split(' ', expand=True)

Unnamed: 0,0,1
0,203.0,cm
1,191.0,cm
2,185.0,cm
3,203.0,cm
4,193.0,cm
...,...,...
458,183.0,cm
459,165.0,cm
460,66.0,cm
461,170.0,cm


In [33]:
info[['Height(cm)', 'cm']] = info['Height'].str.split(' ', expand=True)
info.head()

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height,Weight,Height(cm),Weight(kg),cm
0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,203.0 cm,441.0 kg,203.0,441.0 kg,cm
1,Male,Icthyo Sapien,good,No Hair,blue,blue,Abe Sapien,Dark Horse Comics,191.0 cm,65.0 kg,191.0,65.0 kg,cm
2,Male,Ungaran,good,No Hair,blue,red,Abin Sur,DC Comics,185.0 cm,90.0 kg,185.0,90.0 kg,cm
3,Male,Human / Radiation,bad,No Hair,green,Unknown,Abomination,Marvel Comics,203.0 cm,441.0 kg,203.0,441.0 kg,cm
4,Male,Human,bad,No Hair,blue,Unknown,Absorbing Man,Marvel Comics,193.0 cm,122.0 kg,193.0,122.0 kg,cm


In [34]:
info['Weight'].str.split(' ', expand=True)

Unnamed: 0,0,1
0,441.0,kg
1,65.0,kg
2,90.0,kg
3,441.0,kg
4,122.0,kg
...,...,...
458,83.0,kg
459,52.0,kg
460,17.0,kg
461,57.0,kg


In [35]:
info[['Weight(kg)', 'kg']] = info['Weight'].str.split(' ', expand=True)
info.head()

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height,Weight,Height(cm),Weight(kg),cm,kg
0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,203.0 cm,441.0 kg,203.0,441.0,cm,kg
1,Male,Icthyo Sapien,good,No Hair,blue,blue,Abe Sapien,Dark Horse Comics,191.0 cm,65.0 kg,191.0,65.0,cm,kg
2,Male,Ungaran,good,No Hair,blue,red,Abin Sur,DC Comics,185.0 cm,90.0 kg,185.0,90.0,cm,kg
3,Male,Human / Radiation,bad,No Hair,green,Unknown,Abomination,Marvel Comics,203.0 cm,441.0 kg,203.0,441.0,cm,kg
4,Male,Human,bad,No Hair,blue,Unknown,Absorbing Man,Marvel Comics,193.0 cm,122.0 kg,193.0,122.0,cm,kg


In [36]:
info= info.drop(columns= ['Height', 'Weight', 'cm', 'kg'])
info.head(1)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height(cm),Weight(kg)
0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,203.0,441.0


In [38]:
info.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 463 entries, 0 to 462
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Gender      463 non-null    object
 1   Race        463 non-null    object
 2   Alignment   463 non-null    object
 3   Hair color  463 non-null    object
 4   Eye color   463 non-null    object
 5   Skin color  463 non-null    object
 6   Hero        463 non-null    object
 7   Publisher   463 non-null    object
 8   Height(cm)  463 non-null    object
 9   Weight(kg)  463 non-null    object
dtypes: object(10)
memory usage: 36.3+ KB


In [40]:
info = info.astype({'Height(cm)': float, 'Weight(kg)': float})
info.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 463 entries, 0 to 462
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Gender      463 non-null    object 
 1   Race        463 non-null    object 
 2   Alignment   463 non-null    object 
 3   Hair color  463 non-null    object 
 4   Eye color   463 non-null    object 
 5   Skin color  463 non-null    object 
 6   Hero        463 non-null    object 
 7   Publisher   463 non-null    object 
 8   Height(cm)  463 non-null    float64
 9   Weight(kg)  463 non-null    float64
dtypes: float64(2), object(8)
memory usage: 36.3+ KB


In [18]:
powers.head()

Unnamed: 0,hero_names,Powers
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed"
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super..."
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abin Sur,Lantern Power Ring
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt..."


In [19]:
powers.loc[1, 'Powers']

'Accelerated Healing,Durability,Longevity,Super Strength,Stamina,Camouflage,Self-Sustenance'

In [20]:
powers['powers_split']= powers['Powers'].str.replace("'",'"')


In [21]:
powers['powers_split'].value_counts()

Intelligence                                                                                                                                                                                                                                                         8
Durability,Super Strength                                                                                                                                                                                                                                            5
Agility,Stealth,Marksmanship,Weapons Master,Stamina                                                                                                                                                                                                                  4
Marksmanship                                                                                                                                                                                                       

In [22]:
exploded= powers.explode('powers_split')
exploded[['hero_names', 'Powers', 'powers_split']].head(20)

Unnamed: 0,hero_names,Powers,powers_split
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed","Agility,Super Strength,Stamina,Super Speed"
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super...","Accelerated Healing,Durability,Longevity,Super..."
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du...","Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abin Sur,Lantern Power Ring,Lantern Power Ring
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt...","Accelerated Healing,Intelligence,Super Strengt..."
5,Abraxas,"Dimensional Awareness,Flight,Intelligence,Supe...","Dimensional Awareness,Flight,Intelligence,Supe..."
6,Absorbing Man,"Cold Resistance,Durability,Energy Absorption,S...","Cold Resistance,Durability,Energy Absorption,S..."
7,Adam Monroe,"Accelerated Healing,Immortality,Regeneration","Accelerated Healing,Immortality,Regeneration"
8,Adam Strange,"Durability,Stealth,Flight,Marksmanship,Weapons...","Durability,Stealth,Flight,Marksmanship,Weapons..."
9,Agent Bob,Stealth,Stealth


In [42]:
merged= pd.merge(info, powers, left_on= 'Hero', right_on= 'hero_names', how= 'inner')
merged.head()

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height(cm),Weight(kg),hero_names,Powers,powers_split
0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,203.0,441.0,A-Bomb,"Accelerated Healing,Durability,Longevity,Super...","Accelerated Healing,Durability,Longevity,Super..."
1,Male,Icthyo Sapien,good,No Hair,blue,blue,Abe Sapien,Dark Horse Comics,191.0,65.0,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du...","Agility,Accelerated Healing,Cold Resistance,Du..."
2,Male,Ungaran,good,No Hair,blue,red,Abin Sur,DC Comics,185.0,90.0,Abin Sur,Lantern Power Ring,Lantern Power Ring
3,Male,Human / Radiation,bad,No Hair,green,Unknown,Abomination,Marvel Comics,203.0,441.0,Abomination,"Accelerated Healing,Intelligence,Super Strengt...","Accelerated Healing,Intelligence,Super Strengt..."
4,Male,Human,bad,No Hair,blue,Unknown,Absorbing Man,Marvel Comics,193.0,122.0,Absorbing Man,"Cold Resistance,Durability,Energy Absorption,S...","Cold Resistance,Durability,Energy Absorption,S..."


## Super Speed Weight
- 1. Compare the average weight of super powers who have Super Speed to those who do not.


In [43]:
avg_wt= merged['Weight(kg)'].mean()

In [44]:
avg_wt

113.58963282937366

In [51]:
super_speed= merged[merged['Powers'].str.contains('Super Speed')]

In [54]:
avg_ss_wt= super_speed.mean()
avg_ss_wt

  avg_ss_wt= super_speed.mean()


Height(cm)    189.444444
Weight(kg)    129.404040
dtype: float64

In [57]:
no_speed= merged[merged['Powers'].str.contains('Super Speed') == False]
no_speed

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height(cm),Weight(kg),hero_names,Powers,powers_split
0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,203.0,441.0,A-Bomb,"Accelerated Healing,Durability,Longevity,Super...","Accelerated Healing,Durability,Longevity,Super..."
1,Male,Icthyo Sapien,good,No Hair,blue,blue,Abe Sapien,Dark Horse Comics,191.0,65.0,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du...","Agility,Accelerated Healing,Cold Resistance,Du..."
2,Male,Ungaran,good,No Hair,blue,red,Abin Sur,DC Comics,185.0,90.0,Abin Sur,Lantern Power Ring,Lantern Power Ring
4,Male,Human,bad,No Hair,blue,Unknown,Absorbing Man,Marvel Comics,193.0,122.0,Absorbing Man,"Cold Resistance,Durability,Energy Absorption,S...","Cold Resistance,Durability,Energy Absorption,S..."
6,Male,Human,good,Brown,brown,Unknown,Agent Bob,Marvel Comics,178.0,81.0,Agent Bob,Stealth,Stealth
...,...,...,...,...,...,...,...,...,...,...,...,...,...
456,Female,Mutant / Clone,good,Black,green,Unknown,X-23,Marvel Comics,155.0,50.0,X-23,"Agility,Accelerated Healing,Durability,Stealth...","Agility,Accelerated Healing,Durability,Stealth..."
457,Male,Unknown,good,Brown,blue,Unknown,X-Man,Marvel Comics,175.0,61.0,X-Man,"Flight,Telepathy,Astral Travel,Teleportation,T...","Flight,Telepathy,Astral Travel,Teleportation,T..."
458,Male,Human,good,Blond,blue,Unknown,Yellowjacket,Marvel Comics,183.0,83.0,Yellowjacket,"Size Changing,Animal Oriented Powers","Size Changing,Animal Oriented Powers"
459,Female,Human,good,Strawberry Blond,blue,Unknown,Yellowjacket II,Marvel Comics,165.0,52.0,Yellowjacket II,"Flight,Energy Blasts,Size Changing","Flight,Energy Blasts,Size Changing"


In [58]:
avg_ns_wt= no_speed.mean()
avg_ns_wt

  avg_ns_wt= no_speed.mean()


Height(cm)    186.376226
Weight(kg)    101.773585
dtype: float64

In [60]:
print(f'Average Super Speed Weight: {avg_ss_wt}')
print(f'Average Non-Super Speed Weigh: {avg_ns_wt}')

Average Super Speed Weight: Height(cm)    189.444444
Weight(kg)    129.404040
dtype: float64
Average Non-Super Speed Weigh: Height(cm)    186.376226
Weight(kg)    101.773585
dtype: float64


- The average weight of heroes with super speed is 129.4 kg
- The average weight of heroes without super speed is 101.7 kg

## Average Publisher Hero Hts
- 2. What is the average height of heroes for each publisher?

In [65]:
merged['Publisher'].unique()

array(['Marvel Comics', 'Dark Horse Comics', 'DC Comics', 'Team Epic TV',
       'George Lucas', 'Shueisha', 'Star Trek', 'Unknown', 'Image Comics'],
      dtype=object)

In [68]:
marvel = merged[merged['Publisher'] == 'Marvel Comics'].mean()

darkhorse = merged[merged['Publisher'] == 'Dark Horse Comics'].mean()

dc = merged[merged['Publisher'] == 'DC Comics'].mean()

epic = merged[merged['Publisher'] == 'Team Epic TV'].mean()

gl= merged[merged['Publisher'] == 'George Lucas'].mean()

shue= merged[merged['Publisher'] == 'Shueisha'].mean()

trek= merged[merged['Publisher'] == 'Star Trek'].mean()

unknown = merged[merged['Publisher'] == 'Unknown'].mean()

image= merged[merged['Publisher'] == 'Image Comics'].mean()


  marvel = merged[merged['Publisher'] == 'Marvel Comics'].mean()
  darkhorse = merged[merged['Publisher'] == 'Dark Horse Comics'].mean()
  dc = merged[merged['Publisher'] == 'DC Comics'].mean()
  epic = merged[merged['Publisher'] == 'Team Epic TV'].mean()
  gl= merged[merged['Publisher'] == 'George Lucas'].mean()
  shue= merged[merged['Publisher'] == 'Shueisha'].mean()
  trek= merged[merged['Publisher'] == 'Star Trek'].mean()
  unknown = merged[merged['Publisher'] == 'Unknown'].mean()
  image= merged[merged['Publisher'] == 'Image Comics'].mean()


In [70]:
print(f'Marvel Comics: {marvel}')
print(f'Dark Horse: {darkhorse}')
print(f'DC Comics: {dc}')
print(f'Epic TV: {epic}')
print(f'George Lucas: {gl}')
print(f'Shueisha: {shue}')
print(f'Star Trek: {trek}')
print(f'Unknown: {unknown}')
print(f'Image Comics: {image}')

Marvel Comics: Height(cm)    191.546128
Weight(kg)    119.579125
dtype: float64
Dark Horse: Height(cm)    176.909091
Weight(kg)    101.818182
dtype: float64
DC Comics: Height(cm)    181.923913
Weight(kg)    104.188406
dtype: float64
Epic TV: Height(cm)    180.75
Weight(kg)     72.00
dtype: float64
George Lucas: Height(cm)    159.6
Weight(kg)     77.4
dtype: float64
Shueisha: Height(cm)    171.5
Weight(kg)     64.5
dtype: float64
Star Trek: Height(cm)    181.5
Weight(kg)     79.0
dtype: float64
Unknown: Height(cm)    178.0
Weight(kg)     83.0
dtype: float64
Image Comics: Height(cm)    211.0
Weight(kg)    405.0
dtype: float64


- Marvel Comics
    - 191.5 cm
- Dark Horse
    - 176.9 cm
- DC Comics
    - 181.9 cm
- Epic TV
    - 180.7 cm
- George Lucas
    - 159.6 cm
- Shueisha
    - 171.5 cm
- Star Trek
    - 181.5 cm
- Unknown 
    - 178 cm
- Image Comics
    - 211 cm