**Part 1: Pandas**
In this section we will learn how to create Pandas dataframes, filter them and sort them to find specific subsets of our data
       
**Part 2: Matplotlib**
In this section we will learn how to create Basic Matplotlib plots to visualize data that we're interested in. These will include line graphs, bar charts, and scatter plots.
      

# Part 1: Pandas

![Screen%20Shot%202020-11-13%20at%208.01.06%20AM.png](attachment:Screen%20Shot%202020-11-13%20at%208.01.06%20AM.png)

The **Pandas Library** is a fast, powerful, and flexible library used for data manipulation and analysis.

In [None]:
# Import Pandas Library
import pandas as pd

## The DataFrame

![Screen%20Shot%202020-11-12%20at%2011.05.43%20AM.png](attachment:Screen%20Shot%202020-11-12%20at%2011.05.43%20AM.png)

Pandas is built around a **data structure called a "Dataframe"**. It's basically just a table.

Consider this table:

| Fruit Color 	| Color  	|
|-------------	|--------	|
| Apple       	| Red    	|
| Lemon       	| Yellow 	|


There are a few different ways to create a dataframe from scratch. We are going to explore two mehods below.


One way, is as a dictionary.

In [None]:
# initalize the data as a dictionary
fruit_data = {
    "Fruit Color": ["Apple", "Red"],
    "Color": ["Lemon", "Yellow"]
     }

# create the dataframe
df1 = pd.DataFrame(fruit_data)

# print the dataframe
df1

Another way to create a dataframe, is as a list of lists. 


In [None]:
# initialize data as a list of lists
fruit_data = [
    ['Apple', 'Red'], 
    ['Lemon', 'Yellow']
]
 
# Create the dataframe
df2 = pd.DataFrame(fruit_data, columns = ['Fruit', 'Color'])
 
# print dataframe
df2

Let's practice creating a dataframe. 

### 💻 **Using whatever method you'd prefer, create a dataframe of the members of your family and their ages.** 💻


In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





## Importing

![Screen%20Shot%202021-02-01%20at%2010.13.40%20AM.png](attachment:Screen%20Shot%202021-02-01%20at%2010.13.40%20AM.png)

**The power of Pandas becomes evident when you work with large datasets.** It has a built in function to easily create a dataframe from a `.csv` file. 

> The Pandas `read_csv()` function takes a path input and returns a Pandas Dataframe.

Why is the does the path have "data_files/" before the name of the file?

In [None]:
# store the path of csv file in a variable
movie_path = "data_files/movie_scores.csv"

# create dataframe from csv file
movie_df = pd.read_csv(movie_path)

Now, let's take a look at `movie_df`.

**Let's use the `.head()` to view a snapshot of the dataframe.**

In [None]:
movie_df.head()

**To display a different number of rows use an integer as an arguement**

In [None]:
movie_df.head(2)

### 💻 **Display the first 7 rows of the `movie_df` below:** 💻

In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





What if we only want to view the film and the rotten tomatoes score? You can **index into the dataframe with specific column names to filter the dataframe.**
> *Note how `.head()` is used to only show a sample of the results*

In [None]:
movie_df[["FILM","RottenTomatoes"]].head()

### 💻 Create a dataframe that looks like this: 💻

> Be sure to only display the first 10 rows

![Screen%20Shot%202020-11-16%20at%2011.01.12%20AM.png](attachment:Screen%20Shot%202020-11-16%20at%2011.01.12%20AM.png)

In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





## Querying

![Screen%20Shot%202020-11-12%20at%2011.06.26%20AM.png](attachment:Screen%20Shot%202020-11-12%20at%2011.06.26%20AM.png)

**If we have a question that only applies to some of our data it can be useful to filter our DataFrame.** In Pandas, there are many ways to filter data. 


### Querying


**For right now, let's focus on `query()`.**

What if we only want to see the scores for 'Ant-Man'? 

In [None]:
movie_df.query("FILM == 'Ant-Man (2015)'")

Or, what if we only wanted to see movies that scored higher than 88 on Metacritic? 

In [None]:
movie_df.query("Metacritic > 88")

**💻 Use the `query()` to find which movies have a RottenTomatoes score of below 10. 💻** 


In [None]:
movie_df.query("FILM == 'Ant-Man (2015)'")


In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





**Queries can also be chained together for more complex searches.**

First let's look at an `and` condition.

For example, what if we only want to see movies with a *Metacritic score of over 90 and an IMDB score of over 7.0*
> *Note the `&` inbetween the two conditionals to represent `and`*

In [None]:
movie_df.query('Metacritic > 90 & IMDB > 7.0')

**💻 Use the `query()` to find which movies have a Fandango_Stars rating less than or equal to 3 and a RottenTomatoes score below 50. 💻** 


In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





Now let's look at `or` condition. 

For example, what if we wanted to look at movies that have a *RottenTomatoes score more 95 or a Metacritic score over 95*.

In [None]:
movie_df.query('RottenTomatoes > 95 | Metacritic > 95').head()

**💻 Use the `query()` to find which movies have an IMDB score less than or equal to 5.0 and a Metacritic score less than or equal to 20. 💻** 


In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





## Summary Functions (Max, Min, Mean, Count, Sum)

In the previous lab, we wrote algorithms to calculate summary statistics. **Now, we can use Pandas to calculate summary statics on dataframes.** 

**`.max()` will return the maximum value in a column**

In [None]:
# Find maximum value for Rotten Tomatoe scores
max_rt_val = movie_df["RottenTomatoes"].max()
max_rt_val

**To find the movies with the max Rotten Tomtatoes score, we can use the `query()` function.** 
> *Note the `@` to tell Pandas to refer to the variable `max_rt_val`*

In [None]:
# Find movies with maximum Rotten Tomatoe score
movie_df.query('RottenTomatoes == @max_rt_val')

### **💻 Find the movies with the maximum Metacritic score. 💻**

In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





**`.min()` will return the minimum value in a column**

In [None]:
# Find minimum value for Rotten Tomatoe scores
min_rt_val = movie_df["RottenTomatoes"].min()

### **💻 Find the movies with the minimum IMDB score. 💻**

In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





**`.mean()` will return the mean value of a column**

In [None]:
# Find mean value for Rotten Tomatoe scores
mean_rt_val = int(movie_df["RottenTomatoes"].mean())

### **💻 Find the movies with the mean Fandango_Stars score. 💻**

In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





**`.count()` will return the number of cells in a column**

In [None]:
movie_df["RottenTomatoes"].count()

### **💻 Find the number of cells in the 'IMDB' column. 💻**

In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





**`.sum()` will return the sum of cells in a column**

In [None]:
movie_df["RottenTomatoes"].sum()

### **💻 Find the sum of the cells in the 'Metacritic' column. 💻**

In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





## Advanced Chaining

**The summary functions can be chained to things that you already know!**

For example, what if we wanted to see the movie with the maximum RottenTomatoe value over 80.

In [None]:
movie_df.query('RottenTomatoes < 80').max()

What if we wanted to see the mean RottenTomatoes and Metacritic score for movies that scored less than 50 on RottenTomatoes and less than 50 on Metacritic?

In [None]:
movie_df.query('RottenTomatoes < 50 & Metacritic < 50')[["RottenTomatoes","Metacritic"]].mean()

### **💻 Find the mean Rotten Tomatoes score for movies with a Metacritic score of over 85? 💻**

In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





### 💻 Find the mean IMDB score for movies with a Metacritic score above average? 

*Hint: You'll probably want to break this into two parts.*

In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





# Part 2: Matplotlib

![Screen%20Shot%202021-02-24%20at%209.08.09%20AM.png](attachment:Screen%20Shot%202021-02-24%20at%209.08.09%20AM.png)

Organizing and filtering data is useful - but its not the whole picture! In this section we will cover how to **visualize data in a number of ways.**

In this section, **we will be analzying ISF weather data from January 2021.**

Lets start by importing everything we need...

In [None]:
# Import Matplotlib 
import matplotlib.pyplot as plt

# Import dataset
weather_path = "data_files/weather_data.csv"
weather_df = pd.read_csv(weather_path)

#convert the Time data to a special date type
weather_df.Time = pd.to_datetime(weather_df.Time) 

# Display dataset
weather_df.head()

## Plot syntax

All of the plots provided by matplotlib have similar syntax

It looks like this:

![Screen%20Shot%202021-03-01%20at%2011.43.45%20AM.png](attachment:Screen%20Shot%202021-03-01%20at%2011.43.45%20AM.png)

## Line plots

This type of plot is great for comparing two variables, or looking at how a variable changes over time.

For example, what if we were trying to answer the question: **How did the tempeture change over the course of the month?** 

To plot temperature over time we use this syntax:

In [None]:
plt.plot(weather_df["Time"],weather_df["Temperature"])  

# Adds a label to the x-axis
plt.xlabel('Time')

# Rotate the x-axis labels
plt.xticks(rotation=45)

# Adds a title to the y-axis
plt.ylabel('Temperaature')

# To add a title to the entire plot
plt.title('Temperture Change Over Time')

# Displays Plot
plt.show()

### 💻 Create a plot to visualize the follow question: *How did the humidity change over the course of the month?*

In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





## Bar charts

Bar charts are great for looking at categorical data.

In [None]:
weather_df.head()

Let's take a look at the average level of pollutants on each of these days

For example, what if we were trying to answer the question: **What was the average level of composition of pollutants in January 2021?** 

In [None]:
# First, we must find the averages for each pollutant. 

NO_average = weather_df["NO"].mean()
NO2_average = weather_df["NO2"].mean()
CO_average = weather_df["CO"].mean()
O3_average = weather_df["O3"].mean()
SO2_average = weather_df["SO2"].mean()

pollutant_names = ["NO","NO2","CO","O3","SO2"]
pollutant_means = [NO_average, NO2_average, CO_average, O3_average, SO2_average]

In [None]:
# Using the averages, we can great a bar graph. 

plt.bar(pollutant_names,pollutant_means)

# Adds a label to the x-axis
plt.xlabel('Pollutants')

# Adds a title to the y-axis
plt.ylabel('Level')

# To add a title to the plot
plt.title('Average Level of Each Pollutant')

# Displays Plot
plt.show()

### 💻 Create a bar chart to visualize the question: *What was the maximum level of each pollutant in January 2021?*

*Hint: copy and paste is your friend!*

In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





## Scatter plots

A scatter plot is great for looking at relationships between variables. 

For example, what if we were trying to answer the question: **Do temperature levels correlate with humidity levels?** 

In [None]:
plt.scatter(weather_df["Temperature"],weather_df["Humidity"])
plt.xlabel('Temperature')
plt.ylabel('Humidity')
plt.title('Relationship Between Temperature and Humidity')

plt.show()

### 💻 Create a scatter plot to visualize the question: *Do temperature levels correlatae with solar radiation levels?*


In [None]:
#Write your code here




## Pie Charts

A pie chart is great for looking at the composition of data. 

For example, what if we were trying to answer the question: **What is the distribution of pollutants?** 

In [None]:
pollutant_names = ["NO","NO2","CO","O3","SO2"]
pollutant_means = [NO_average, NO2_average, CO_average, O3_average, SO2_average]

plt.pie(pollutant_means,            # sets data
        labels = pollutant_names,   # sets labels
        autopct='%1.1f%%')          # displays values

plt.show()

### 💻 Create a pie chart to visualize the question: *What was the distribution of temperatures?*

Create a pie chart to show the distribution of days where the temperature was in the following bands:

- <10ºC
- 10-15ºC
- 15-20ºC
- 20-25ºC
- \≥ 25ºC

*Hint: You'll need to combine your pandas filtering skills with your matplotlib plotting skills to figure this one out!*

In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###





# Part 3: Extension

MatplotLib is an incredibly large Python library. We have only scratched the surface with this lab. 

Explore the offical documentation and what's possible: https://matplotlib.org/stable/gallery/index.html

### 💻 Create a chart of your choosing.


In [None]:
### 💻 YOUR CODE GOES HERE 💻  ###



