# Make a chart
Explore various ways to chart.

* Line chart or bar chart
* Cumulative cases, cumulative deaths
* New cases, new deaths (change from the prior day)
* Raw values or rolling averages (5-day, 7-day, etc to smooth it out)
* Raw values or population-adjusted values 

Keep in mind:
* What is important to convey to the viewer
* Convey what is accurate, but also, how serious is the situation (did the local or state or federal government define what they consider "severe" vs "not severe"?
* For a viewer within a county, if they see this chart everyday, can they pick up on whether the situation is getting worse or getting better?
* For a viewer looking across counties, do they have a reasonable way to compare across counties (who may have different definitions of what "severe" is)?
* How do you compare very large cities like NYC and LA to much smaller cities like Indianapolis and Alburquerque?

## Bring in cleaned data
The data I cleaned I did in a previous notebook. How can I bring them in?

Use a `utils.py` file to easily use functions across notebooks. 
This is one way functions keep your data cleaning and processing clean.
You can put all your commonly used functions in one file, then use them in each notebook.

Note: `utils.py` must be in the same directory (folder) as your notebooks, so that you can import it.

In [1]:
import pandas as pd

import utils

In [2]:
df = utils.clean_jhu()

In [3]:
df.head()

Unnamed: 0,county,state,state_abbrev,fips,date,Lat,Lon,cases,deaths,new_cases,new_deaths,county_pop,date2,cases_avg7,deaths_avg7
0,Abbeville,South Carolina,SC,45001,2020-03-19,34.223334,-82.461707,1,0,1,0,24527.0,2020-03-19,,
1,Abbeville,South Carolina,SC,45001,2020-03-20,34.223334,-82.461707,1,0,0,0,24527.0,2020-03-20,,
2,Abbeville,South Carolina,SC,45001,2020-03-21,34.223334,-82.461707,1,0,0,0,24527.0,2020-03-21,,
3,Abbeville,South Carolina,SC,45001,2020-03-22,34.223334,-82.461707,1,0,0,0,24527.0,2020-03-22,,
4,Abbeville,South Carolina,SC,45001,2020-03-23,34.223334,-82.461707,1,0,0,0,24527.0,2020-03-23,,


## Use the `altair` package to make charts

In [4]:
import altair as alt
#alt.themes.enable('urbaninstitute')
alt.themes.enable('vox')

# Other themes: https://vega.github.io/vega-themes/

ThemeRegistry.enable('vox')

In [5]:
# Make a line chart
def make_chart(df, county_name, start_date):
    
    # Subset by county and start date
    df = (df[(df.date2 >= start_date) & 
            (df.county == county_name)]
          # date2, which is datetime can be used in altair
          # but date will throw up a JSON-serializable error
          .drop(columns = "date")
         )
        
    # Make cases charts    
    cases_line = (
        alt.Chart(df)
        .mark_bar()
        .encode(
            x=alt.X("date2", title="date"),
            y=alt.Y("cases_avg7:Q", title="7-day avg"),
        ).properties(
            title="Daily New Cases", width=300, height=200
        )
    )
    
    display(cases_line)

In [6]:
make_chart(df, "Los Angeles", "6-1-20")

In [7]:
# Make bar chart 
alt.themes.enable('fivethirtyeight')

def make_chart(df, county_name, start_date):
    
    # Subset by county and start date
    df = (df[(df.date2 >= start_date) & 
            (df.county == county_name)]
          # date2, which is datetime can be used in altair
          # but date will throw up a JSON-serializable error
          .drop(columns = "date")
         )
        
    # Make cases charts    
    cases_bar = (
        alt.Chart(df)
        .mark_bar()
        .encode(
            x=alt.X("date2", title="date"),
            y=alt.Y("cases_avg7:Q", title="7-day avg"),
        ).properties(
            title="Daily New Cases", width=300, height=200
        )
    )
    
    display(cases_bar)

In [8]:
make_chart(df, "Los Angeles", "6-1-20")

## Multiple layers to a chart
Experiment with the different layers you'd like to add.

Ideas to experiment with:
* Multiple lines
* Bar chart with daily numbers with a line chart of rolling average that smooths out the daily fluctuations
* Shade the last 2 weeks or the last week 
* Add lines to show "severity" by using CA's 4 tiers definition

In [9]:
from datetime import date, timedelta

# 15 days ago because the case data only goes up to yesterday's date
# We won't get today's full case numbers until tomorrow
two_weeks_ago = (
    (date.today()
        - timedelta(days=15)
    )
)

two_weeks_ago

datetime.date(2020, 12, 13)

In [10]:
# Actually, this will work with our column `date`, but not `date2`
type(two_weeks_ago)

datetime.date

In [11]:
# This will work with our column `date2`, but not `date`
type(pd.to_datetime(two_weeks_ago))

pandas._libs.tslibs.timestamps.Timestamp

In [12]:
alt.themes.enable('latimes')

def make_chart(df, county_name, start_date):
    
    # Subset by county and start date
    df = (df[(df.date2 >= start_date) & 
            (df.county == county_name)]
          # date2, which is datetime can be used in altair
          # but date will throw up a JSON-serializable error
          .drop(columns = "date")
         )
    
    
    df_two_weeks = (df[df.date2 >= pd.to_datetime(two_weeks_ago)])
    
    # Set up base charts
    '''
    The base charts are a keep certain chart characteristics over multiple charts. 
    Similar to functions, it allows us to "inherit" certain things and then add-on more customization.
    This quickly becomes handy if we're adding many, many layers.
    '''
    base = (alt.Chart(df)
        .mark_line()
        .encode(
            x=alt.X("date2", title="date")
        )
    )
    
    base_2weeks = (
        alt.Chart(df_two_weeks)
        .mark_line()
        .encode(
            x=alt.X("date2", title="date", axis=alt.Axis(format="%-m/%-d"))
        )
    )
        
    # Make cases charts    
    cases_line = (
        base
        .encode(
            y=alt.Y("cases_avg7:Q", title="7-day avg"),
        )
    )
    
    # Area chart gets us the shading
    cases_shaded = (
        base_2weeks
        .mark_area()
        .encode(
            y=alt.Y("cases_avg7:Q", title="7-day avg"),
            color=alt.value("#EAEBEB")
        )
    )

    
    # We'll put the shaded area first, then the line
    # otherwise, the shaded area chart will cover up part of the line
    cases_chart = (
        (cases_shaded + cases_line)
        .properties(
              title="Daily New Cases", width=300, height=200
            )
        )
    
    display(cases_chart)

In [13]:
make_chart(df, "Los Angeles", "8/1/20")

If you're making lots of charts, you can also move all your charting functions into a `charts_utils.py` file, so that you can reuse those chart functions across notebooks!

Experiment with how the chart function is defined, especially with what args you want to use. It doesn't necessarily have to be `county_name` and `start_date`, it can include other things, or even different things. It simply has to suit your need for making the chart. When you're thinking of making the same chart for different counties, you'll have to think of what args you need to make it simple to repeat over and over.