In [None]:
from datascience import *
import numpy as np
%matplotlib inline

# Getting the data
---

This data was compiled from a number of queries from the NOAA (National Oceanic and Atmospheric Administration) website.

NOAA National Centers for Environmental information, Climate at a Glance: Regional Time Series, published October 2018, retrieved on October 18, 2018 from https://www.ncdc.noaa.gov/cag/

In [None]:
temps = Table.read_table("temperatures.csv")
temps

It contains a `Year` (2000 to 2018), `Month` (1 to 12), `Region` (a climate region), and `Average Temperature` (the average temperature for that year/month/region in degrees Farenheit).

# Let's go over some basics of `where`, `group`, columns on this dataset
---

By getting familiar with our dataset, it should be easier for us to answer questions we have a bit later.

Recall the syntax of the following functions:

`table.where ( column_label, are.predicate( value ) ) => copy of table with only rows satisfying condition`

`table.group ( column_label, method ) => copy of table with only unique rows from column_label and values calculated by the method, default is count`

`table.column ( column_label ) => an array of the column specified`

`table.with_column ( column_label, array_values ) => copy of table with a new column named column_label and values from the array_values`

In [5]:
# If we need any more information, we can always use help
help

Type help() for interactive help, or help(object) for help about object.

In [None]:
# As a helping hand to get started, let's find out what regions we have
np.unique(temps.column("Region"))

In [None]:
region_names = np.unique(temps.column("Region"))
for region in region_names:
    print(region)

# Play around with the dataset using `where`, `group`, etc
---

In [None]:
# Maybe find the data for just one region

In [None]:
# Maybe find the average temperature for each region

In [None]:
# Maybe find the average temperature for each year

In [None]:
# Maybe find the average temperature for each year for just one region

In [None]:
# Maybe find the average temperature for each month

In [None]:
# Maybe find the average temperature for any region containing "North" versus containing "South"

# Let's look at trends over time
---
What's a good way to represent this data?

How does temperature change over the course of a year?  What about two years?

How does temperature change over the course of 18 years?

In [None]:
























# Maybe a line plot :) 

In [None]:
# Plot against month for a given year

In [None]:
# Plot against month for a range of two years

In [None]:





















# Uh oh!
# We need some way to distinguish the months in one year from another year

In [None]:



















# Maybe give apply a shot

# Recall the syntax of `apply`
---

`table.apply ( method, column_name, column_name, ... ) => returns an array of the method applied to the values in the specified columns`

In [None]:
help(Table.apply)

We should look at the [datetime.date documentation](https://docs.python.org/3.6/library/datetime.html#date-objects).

In [None]:
# Let's write a function to turn a year and month into a datetime that python can understand

import datetime

def make_date_from_year_month(year, month):
    return datetime.date(...)

In [None]:
# Apply our function, what do we get?
temps.apply(...)

In [None]:
# Okay, what can we do with that?
...

# Let's finally plot monthly temperatures over multiple years
---

In [None]:
start_date = datetime.date(2016, 1, 1)
end_date = datetime.date(2018, 1, 1)

temps_with_date.where(
    "Date", ...
).plot(...)

In [None]:
# Let's change the start date to look at the temperature from 2000 to 2018

# Alright, what about annual average temperature over time
---

In [None]:
# Plot the annual average temperature versus year
...

What if I just wanted to look at West Climate Region?

In [None]:
...

In [None]:
# Is there anything weird?

# Why?

# What does the distribution of ___ look like?
---

In [None]:
temps.group("Year", np.mean).hist(3, normed=False, bins=np.arange(50, 57, 0.5))

In [None]:
# We can change "Year" and np.mean to a bunch of different stuff

# Make sure that we allow default bins if we don't fully know what we're working with

# Let's use normed=False to get counts instead of density--a little easier to understand

How many years had an average temperature greater than 54°F, using counts?

In [7]:
# How many years had an avg temp of > 54


In [None]:
# Let's try the distribution of annual average temperatures
temps.group("Year", np.mean).hist(3, normed=True, bins=np.arange(50, 57, 0.5))

How many years had an average temperature greater than 54°F, using density?

In [None]:
# We need to know how many years we have
temps.group("Year", np.mean).num_rows

In [8]:
# We have <> years
# How many years had an avg temp of > 54


# We want to compare annual average temperature accross multiple regions
---

There's an easy way and a hard way to go about this.

The way that we're most comfortable with starting is actually a bit more challenging... let's do it anyways

In [None]:
# I want to compare Northeast to West

# What should my logic be to compare Annual Avg Temps for Northeast to those of West?





























# We've done this before for one region:

# Get data for just that region, then group by year

In [None]:
# Grab data for just northeast
northeast_data = ...
northeast_data

In [None]:
# Grab data for just west
west_data = ...

In [None]:
# Group both by year
northeast_years = ...
west_years = ...

But now I have two separate tables.  We've only learned how to plot from one table so far.

We can use matplotlib to circumvent that (don't worry, you don't need to know how to do this)

In [None]:
# Don't worry about this code too much.
import matplotlib.pyplot as plt

plt.hist(
    [northeast_years.column("Average Temperature mean"), west_years.column("Average Temperature mean")],
    histtype='stepfilled'
)
plt.show()

Or we can create a new table, add our columns to it, and then plot

In [None]:
# Create a new table with a column for Northeast and a column for West, and rows for every year
northeast_vs_west = ...

In [None]:
# We can histogram this table normally
northeast_vs_west.hist()

# There's an easier way
---

Using `table.pivot` we can essentially perform `.where`, `.group`, and `.column` for every region at once.

Let's delve into [the documentation](http://data8.org/datascience/_autosummary/datascience.tables.Table.pivot.html) to find out how to use this method.

In [None]:
# Before we use the method, we should think clearly about what we want to achieve
all_year_avgs = temps.pivot(...)
all_year_avgs

In [None]:
# Now let's compare West Climate Region to Northeast Climate Region
...

In [None]:
# We can even compare every region this way
...

# Let's finish with some knowledge about histograms
---

We might have some questions about these histograms, and we should figure out the answers!

Q. What do histograms measure?

Q. Why the histograms for some regions taller than others?

Q. What do we know is true for every region's histogram?

Q. If we change to density (`normed=False`) then does our previous answer change?

Q. How do we calculate the number of occurrences of a certain bin using counts (`normed=False`)?

Q. How do we calculate the number of occurrences of a certain bin using density?