![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

# Pandas DataFrame exercises


In [1]:
# Import the numpy package under the name np
import numpy as np

# Import the pandas package under the name pd
import pandas as pd

# Import the matplotlib package under the name plt
import matplotlib.pyplot as plt
%matplotlib inline

# Print the pandas version and the configuration
print(pd.__version__)

1.3.4


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame creation

### Create an empty pandas DataFrame


In [4]:
a = pd.DataFrame(
    data=[None],
    index = [None],
    columns = [None])
a

Unnamed: 0,NaN
,


In [None]:
pd.DataFrame(data=[None],
             index=[None],
             columns=[None])

<img width=400 src="https://cdn.dribbble.com/users/4678/screenshots/1986600/avengers.png"></img>

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Create a `marvel_df` pandas DataFrame with the given marvel data


In [5]:
marvel_data = [
    ['Spider-Man', 'male', 1962],
    ['Captain America', 'male', 1941],
    ['Wolverine', 'male', 1974],
    ['Iron Man', 'male', 1963],
    ['Thor', 'male', 1963],
    ['Thing', 'male', 1961],
    ['Mister Fantastic', 'male', 1961],
    ['Hulk', 'male', 1962],
    ['Beast', 'male', 1963],
    ['Invisible Woman', 'female', 1961],
    ['Storm', 'female', 1975],
    ['Namor', 'male', 1939],
    ['Hawkeye', 'male', 1964],
    ['Daredevil', 'male', 1964],
    ['Doctor Strange', 'male', 1963],
    ['Hank Pym', 'male', 1962],
    ['Scarlet Witch', 'female', 1964],
    ['Wasp', 'female', 1963],
    ['Black Widow', 'female', 1964],
    ['Vision', 'male', 1968]
]

In [7]:
marvel_dt = pd.DataFrame(data=marvel_data)
marvel_dt

Unnamed: 0,0,1,2
0,Spider-Man,male,1962
1,Captain America,male,1941
2,Wolverine,male,1974
3,Iron Man,male,1963
4,Thor,male,1963
5,Thing,male,1961
6,Mister Fantastic,male,1961
7,Hulk,male,1962
8,Beast,male,1963
9,Invisible Woman,female,1961


In [None]:
marvel_df = pd.DataFrame(data=marvel_data)

marvel_df

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Add column names to the `marvel_df`
 

In [9]:
marvel_dt.columns = ["name", "sex", "year"]
marvel_dt

Unnamed: 0,name,sex,year
0,Spider-Man,male,1962
1,Captain America,male,1941
2,Wolverine,male,1974
3,Iron Man,male,1963
4,Thor,male,1963
5,Thing,male,1961
6,Mister Fantastic,male,1961
7,Hulk,male,1962
8,Beast,male,1963
9,Invisible Woman,female,1961


In [None]:
col_names = ['name', 'sex', 'first_appearance']

marvel_df.columns = col_names
marvel_df

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Add index names to the `marvel_df` (use the character name as index)


In [12]:
marvel_dt.index = marvel_dt["name"]
marvel_dt

Unnamed: 0_level_0,name,sex,year
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Spider-Man,Spider-Man,male,1962
Captain America,Captain America,male,1941
Wolverine,Wolverine,male,1974
Iron Man,Iron Man,male,1963
Thor,Thor,male,1963
Thing,Thing,male,1961
Mister Fantastic,Mister Fantastic,male,1961
Hulk,Hulk,male,1962
Beast,Beast,male,1963
Invisible Woman,Invisible Woman,female,1961


In [None]:
marvel_df.index = marvel_df['name']
marvel_df

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Drop the name column as it's now the index

In [16]:
marvel_dt.drop(columns="name", axis=1)

Unnamed: 0_level_0,sex,year
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Spider-Man,male,1962
Captain America,male,1941
Wolverine,male,1974
Iron Man,male,1963
Thor,male,1963
Thing,male,1961
Mister Fantastic,male,1961
Hulk,male,1962
Beast,male,1963
Invisible Woman,female,1961


In [None]:
#marvel_df = marvel_df.drop(columns=['name'])
marvel_df = marvel_df.drop(['name'], axis=1)
marvel_df

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Drop 'Namor' and 'Hank Pym' rows


In [18]:
marvel_dt.drop(["sex"], axis=1)

Unnamed: 0_level_0,name,year
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Spider-Man,Spider-Man,1962
Captain America,Captain America,1941
Wolverine,Wolverine,1974
Iron Man,Iron Man,1963
Thor,Thor,1963
Thing,Thing,1961
Mister Fantastic,Mister Fantastic,1961
Hulk,Hulk,1962
Beast,Beast,1963
Invisible Woman,Invisible Woman,1961


In [None]:
marvel_df = marvel_df.drop(['Namor', 'Hank Pym'], axis=0)
marvel_df

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame selection, slicing and indexation

### Show the first 5 elements on `marvel_df`
 

In [26]:
marvel_dt.iloc[0:4,]

Unnamed: 0_level_0,name,sex,year
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Spider-Man,Spider-Man,male,1962
Captain America,Captain America,male,1941
Wolverine,Wolverine,male,1974
Iron Man,Iron Man,male,1963


In [None]:
#marvel_df.loc[['Spider-Man', 'Captain America', 'Wolverine', 'Iron Man', 'Thor'], :] # bad!
#marvel_df.loc['Spider-Man': 'Thor', :]
#marvel_df.iloc[0:5, :]
#marvel_df.iloc[0:5,]
marvel_df.iloc[:5,]
#marvel_df.head()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Show the last 5 elements on `marvel_df`


In [None]:
# your code goes here


In [None]:
#marvel_df.loc[['Hank Pym', 'Scarlet Witch', 'Wasp', 'Black Widow', 'Vision'], :] # bad!
#marvel_df.loc['Hank Pym':'Vision', :]
marvel_df.iloc[-5:,]
#marvel_df.tail()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Show just the sex of the first 5 elements on `marvel_df`

In [None]:
# your code goes here


In [None]:
#marvel_df.iloc[:5,]['sex'].to_frame()
marvel_df.iloc[:5,].sex.to_frame()
#marvel_df.head().sex.to_frame()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Show the first_appearance of all middle elements on `marvel_df` 

In [None]:
# your code goes here


In [None]:
marvel_df.iloc[1:-1,].first_appearance.to_frame()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Show the first and last elements on `marvel_df`


In [None]:
# your code goes here


In [None]:
#marvel_df.iloc[[0, -1],][['sex', 'first_appearance']]
marvel_df.iloc[[0, -1],]

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame manipulation and operations

### Modify the `first_appearance` of 'Vision' to year 1964

In [None]:
# your code goes here


In [None]:
marvel_df.loc['Vision', 'first_appearance'] = 1964

marvel_df

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Add a new column to `marvel_df` called 'years_since' with the years since `first_appearance`


In [None]:
# your code goes here


In [None]:
marvel_df['years_since'] = 2018 - marvel_df['first_appearance']

marvel_df

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame boolean arrays (also called masks)

### Given the `marvel_df` pandas DataFrame, make a mask showing the female characters


In [None]:
# your code goes here


In [None]:
mask = marvel_df['sex'] == 'female'

mask

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, get the male characters


In [None]:
# your code goes here


In [None]:
mask = marvel_df['sex'] == 'male'

marvel_df[mask]

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, get the characters with `first_appearance` after 1970


In [None]:
# your code goes here


In [None]:
mask = marvel_df['first_appearance'] > 1970

marvel_df[mask]

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, get the female characters with `first_appearance` after 1970

In [None]:
# your code goes here


In [None]:
mask = (marvel_df['sex'] == 'female') & (marvel_df['first_appearance'] > 1970)

marvel_df[mask]

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame summary statistics

### Show basic statistics of `marvel_df`

In [None]:
# your code goes here


In [None]:
marvel_df.describe()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, show the mean value of `first_appearance`

In [None]:
# your code goes here


In [None]:

#np.mean(marvel_df.first_appearance)
marvel_df.first_appearance.mean()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, show the min value of `first_appearance`


In [None]:
# your code goes here


In [None]:
#np.min(marvel_df.first_appearance)
marvel_df.first_appearance.min()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, get the characters with the min value of `first_appearance`

In [None]:
# your code goes here


In [None]:
mask = marvel_df['first_appearance'] == marvel_df.first_appearance.min()
marvel_df[mask]

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame basic plottings

### Reset index names of `marvel_df`


In [None]:
# your code goes here


In [None]:
marvel_df = marvel_df.reset_index()

marvel_df

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Plot the values of `first_appearance`


In [None]:
# your code goes here


In [None]:
#plt.plot(marvel_df.index, marvel_df.first_appearance)
marvel_df.first_appearance.plot()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Plot a histogram (plot.hist) with values of `first_appearance`


In [None]:
# your code goes here


In [None]:

plt.hist(marvel_df.first_appearance)

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)
