In [None]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

# Today's lecture: pie, bar, and scatter plots
- matplotlib is similar to MATLAB
- matplotlib integrates with pandas, just like sqlite3 integrates with pandas
- Series.plot.PLOT_FN(...)
- DataFrame.plot.PLOT_FN(...)
- Example PLOT_FNs: pie, scatter, bar, line

In [None]:
# import statements
import pandas as pd
from pandas import DataFrame, Series
import sqlite3
import os


In [None]:
# matplotlib font size settings


### Let's create a pandas Series
1. pandas Series can be made using list or dictionary.
2. pandas Series has both index (similar to dictionary key) and integer position (similar to list index).
3. While creating a brand new list, index and integer position are the same.

In [None]:
s = Series([5000000, 3000000, 2000000])
s

## Pie plot
- gives you a sense of ratio

### What's wrong with the above plot?

- The labels are all wrong.
- From where are you getting 0, 1, and 2 as labels? ---> let's fix just this
- It is difficult to read the actual numbers: we can only see the relative portions, not the absolute amounts
- It says "None" to the left.
- The font is tiny.
- No indication of what is being plotted here.

In [None]:
s = Series({"Police": 5000000, "Fire": 3000000, "Schools": 2000000})
s

## Bar plot
- A lot of times bar plot is better
- You can see absolute numbers in bar plot

### How can we set the x-axis, y-axis labels, and title?
- plot_FN(...) returns what is called as AxesSubplot

What is the type returned by a plot function?

### What is this 1e6? Can we make the y-axis values more readable?

Recall that you can easily apply element-wise operation on a Series.

### The x-axis tick labels are difficult to read. Can we rotate them to make it more readable?

How can we extract the indices from a Series?

### How to change font inside the figure? 
- Need to import matplotlib
- Using matplotlib.rcParams["font.size"] = ????

### How can we change the figure size?
- figsize argument to plot_FN(...)
- argument to figsize should be a tuple with two values: width and height

### How can we make the bars horizontal?
- We have to switch figsize arguments
- We have to change y-label to x-label

### How can we change bar color?
- color parameter in plot_FN(...)
    - 3 choices for arguments: 
        - full name of color
        - single letter representation of the color
        - grayscale (string value between "0" and "1")

### How can we mark gridlines?

### How can we erase the top and right-hand side margin?
- ax.spines ---> gives list of spines

### How can we capture subplots? 
- from matplotlib import pyplot as plt
- returns a tuple of figure, AxesSubplot
- we can use it to write a function that applies all the plot add-on aspects for all the plots in a report that we are writing

Let's refactor the bar plot code ...

In [None]:
def get_ax(height = 3):
    # Step 1: Call plt.subplots, make sure to set figsize
    # Setp 2 & 3: Set visibility of top and right spines to False
    # Step 3: return AxesSubplot
    pass

ax = get_ax(1.5)

## bus.db examples

In [None]:
path = "bus.db"
assert os.path.exists(path)
conn = sqlite3.connect(path)

### Recap on exploring SQL database
- pd.read_sql(QUERY, CONNECTION)
- QUERY: SELECT * from sqlite_master
- QUERY: SELECT * from boarding

In [None]:
pd.read_sql("""
SELECT * from
sqlite_master""", conn)

In [None]:
pd.read_sql("""
SELECT * from
boarding""", conn)

### What are the top routes, and how many people ride them daily?

In [None]:
df = pd.read_sql("""
SELECT ???
FROM ???
GROUP BY ???
""", conn)
df

#### Let's take the daily column out as a Series ...

### Oops, too much data. Let's filter down to top 5 routes. How can we do that in SQL?

In [None]:
df = pd.read_sql("""
SELECT ???
FROM ???
GROUP BY ???
ORDER BY ???
LIMIT ???
""", conn)
df

#### Huh, what exactly is route 0? Where is that coming from?
- Oops, it is coming from dataframe row index!
- Let's fix that: we can use df.set_index(...)

### Wouldn't it be nice to have an "other" bar to represent other routes?
- we have to now get rid of LIMIT clause
- we have to deal with other routes using pandas

In [None]:
df = pd.read_sql("""
SELECT ???
FROM ???
GROUP BY ???
ORDER BY ???
""", conn)

df = df.set_index("Route")
s = df["daily"]
df.head()

#### We are back to plotting all route bars ...

### How can we slice a pandas dataframe?
- Recall that .iloc allows us to do slicing.
- For reproducing previous 5-route plot, we just need to take first 5 route details and populate into a series s.
- For the "other" part, we want all the rows in dataframe after row 5 summed up together.
- What should start and end in start:end be for getting the above two slices?
- Once we compute "other" count, we can add that back to the series s.

### Let's fix the plot asthetics ...

### Scatter plot
- copy paste the data from trees.txt
- When we have a series to plot:
    - s.plot.bar()
    - index  => x-axis
    - values => y-axis
- When we have a data frame:
    - df.plot.scatter(x = column_name, y = column_name)
    

In [None]:
trees = [
    {"age": 1, "height": 1.5, "diameter": 0.8},
    {"age": 1, "height": 1.9, "diameter": 1.2},
    {"age": 1, "height": 1.8, "diameter": 1.4},
    {"age": 2, "height": 1.8, "diameter": 0.9},
    {"age": 2, "height": 2.5, "diameter": 1.5},
    {"age": 2, "height": 3, "diameter": 1.8},
    {"age": 2, "height": 2.9, "diameter": 1.7},
    {"age": 3, "height": 3.2, "diameter": 2.1},
    {"age": 3, "height": 3, "diameter": 2},
    {"age": 3, "height": 2.4, "diameter": 2.2},
    {"age": 2, "height": 3.1, "diameter": 2.9},
    {"age": 4, "height": 2.5, "diameter": 3.1},
    {"age": 4, "height": 3.9, "diameter": 3.1},
    {"age": 4, "height": 4.9, "diameter": 2.8},
    {"age": 4, "height": 5.2, "diameter": 3.5},
    {"age": 4, "height": 4.8, "diameter": 4},
]
df = DataFrame(trees)
df

#### We will continue this example in the next lecture ... 