# Introduction

Complete this exercise on pandas to gain an understanding on how to create tabular data in python.

In [None]:
import pandas as pd

## Creating and Using DataFrames

### 1a. Create a DataFrame from the `data` list shown in the cell. 

In [None]:
# Run this cell to generate the data list
data = [['Hippocampus',3.5],['Amygdala',2.1],['Cerebellum',100.4]]

<details>
  <summary>Click on the dropdown to hide/unhide the answer!</summary>
  
  ### Answer
  <pre>
      df = pd.DataFrame(data)
      df.head()
  </pre>
</details>

### 1b. Use the .columns method to rename the `DataFrame` columns to "Brain Region" and "Volume (cc)", then display the `DataFrame.

<details>
  <summary>Click on the dropdown to hide/unhide the answer!</summary>
  
  ### Answer
  <pre>
      df.columns = ["Brain Region", "Volume (cc)"]
      df.head()
  </pre>
</details>

### 1c. Use the iloc method to display the hippocampus's volume.

<details>
  <summary>Click on the dropdown to hide/unhide the answer!</summary>
  
  ### Answer
  <pre>
      df.iloc[0,1]
  </pre>
</details>

### 1d. Create a new column containing the number of neurons in each brain region, assign a random number of your choice to each brain region.

<details>
  <summary>Click on the dropdown to hide/unhide the answer!</summary>
  
  ### Answer
  <pre>
      df['Neuron Count (millions)'] = [100, 50, 10000]
      df.head()
  </pre>
</details>

### 1e. Write a boolean mask to indicate a true/false value when the brain region's volume is less than 5 cc.

<details>
  <summary>Click on the dropdown to hide/unhide the answer!</summary>
  
  ### Answer
  <pre>
      bool_a= df['Volume (cc)']<5
      bool_a
  </pre>
</details>

### 1f. Display the entire row of the brain region whose volume is over 5 cc.

<details>
  <summary>Click on the dropdown to hide/unhide the answer!</summary>
  
  ### Answer
  <pre>
      df.loc[df['Volume (cc)']>5] or df[df['Volume (cc)']>5]
  </pre>
</details>

### 1g. Assign the `brain_regions.csv` file to a second DataFrame called "df_brain_regions" and print the DataFrame.

<details>
  <summary>Click on the dropdown to hide/unhide the answer!</summary>
  
  ### Answer
  <pre>
      df_brain_regions = pd.read_csv('brain_regions.csv')
      print(df_brain_regions)
  </pre>
</details>

### 1h. Determine the dimensions of `df_brain_regions`.

<details>
  <summary>Click on the dropdown to hide/unhide the answer!</summary>
  
  ### Answer
  <pre>
      df_brain_regions.shape
  </pre>
</details>

### 1i. Display all rows that have a neuron count over 1000.

<details>
  <summary>Click on the dropdown to hide/unhide the answer!</summary>
  
  ### Answer
  <pre>
      df_brain_regions[df_brain_regions['Neuron Count (millions)']>1000]
  </pre>
</details>

### 1j. Display only the Brain Region and Type columns for rows with a Neuron Count over 1000.

<details>
  <summary>Click on the dropdown to hide/unhide the answer!</summary>
  
  ### Answer
  <pre>
      df_brain_regions[df_brain_regions['Neuron Count (millions)'] > 1000][['Brain Region', 'Type']]
  </pre>
</details>

## Aggregating and Grouping Data

### 2a. Calculate the average volume of the brain regions and assign it to a variable called `average_volume`.

<details>
  <summary>Click on the dropdown to hide/unhide the answer!</summary>
  
  ### Answer
  <pre>
      average_volume = df_brain_regions['Volume (cc)'].mean()
      print(average_volume)
  </pre>
</details>

### 2b. Calculate the average volume of the brain regions, grouped by the `Type` column. Assign this to a variable called `grouped_df`.

<details>
  <summary>Click on the dropdown to hide/unhide the answer!</summary>
  
  ### Answer
  <pre>
      grouped_df=df_brain_regions.groupby('Type')['Volume (cc)'].mean()
      print(grouped_df)
  </pre>
</details>

### 2c. Populate the missing values in the DataFrame below with the average response time for each mouse.

In [None]:
# Run this cell to import numpy
import numpy as np

In [None]:
# Generate the df_response_times Data Frame
df_response_times = pd.DataFrame({
    'mouse1': [20.5, 22.1, np.nan, 19.8, 21.3],
    'mouse2': [18.9, 17.2, np.nan, 15, np.nan],
    'mouse3': [16.4, 25, np.nan, np.nan, 22.3]
})

<details>
  <summary>Click on the dropdown to hide/unhide the answer!</summary>
  
  ### Answer
  <pre>
      df_response_times.fillna(df_response_times.mean(), inplace=True)
      df_response_times
  </pre>
</details>

## Merging and Concatenating Data Frames

### 3a. Execute the following steps:
1. Create a new DataFrame, called df2, from the `additional_features` dictionary shown in the cell below. Note that you will learn more about dictionaries in the next lesson.
2. Merge `df2` with `df_brain_regions` using the `Brain Region` column.
3. Display the merged data frame.

In [None]:
# Run this cell to create the additional_features dictionary
additional_features = {'Brain Region': ['Hippocampus', 'Amygdala', 'Cerebellum'],
                   'Function': ['Memory', 'Emotion', 'Motor Control']}

In [None]:
# Create your new DataFrame from the additional_features dictionary in this cell


In [None]:
# Merge the DataFrames and display the newly merged DataFrame in this cell


<details>
  <summary>Click on the dropdown to hide/unhide the answer!</summary>
  
  ### Answer
  <pre>
      df2 = pd.DataFrame(additional_features)
      merged_df = pd.merge(df_brain_regions, df2, on='Brain Region')
      merged_df
  </pre>
</details>

### 3b. Execute the following steps:
1. Create a new DataFrame named `df3` using the `additional_samples` dictionary shown in the cell below. 
2. Concatenate `df3` with the `merged_df`, then updated `merged_df` with the combined data.
3. Display the updated `merged_df`.

In [None]:
# Run this cell to create the additional_features dictionary
additional_samples = {'Brain Region': ['Nucleus Accumbens', 'Hypothalamus'],
                      'Volume (cc)': [1.5 , 0.25],
                      'Neuron Count (Millions)': [25.0, 10.0],
                      'Type': ['Subcortex', 'Subcortex'],
                   'Function': ['Reward', 'Homeostasis']
                     }

In [None]:
# Create your new DataFrame from the additional_samples dictionary


In [None]:
# Concatenate the DataFrames and display the updated merged_df


<details>
  <summary>Click on the dropdown to hide/unhide the answer!</summary>
  
  ### Answer
  <pre>
      df3 = pd.DataFrame(additional_samples)
      merged_df = pd.concat([merged_df, df3])
      merged_df
  </pre>
</details>