In [None]:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

First, we're going to import data on the median income by county.

This data is contained in the file est18.all.xls, an Excel file. (If you're able to) open this file to see what the format looks like.

Luckily for us, _pandas_ has a _read_excel_ function we can use here. First, let's check the documentation.

In [None]:
pd.read_excel?

Out of all of the arguments, we are going to use four:
* `io` - This will be the filepath to our Excel file
* `sheet_name` - We'll specify the name of the sheet containing the data we need.
* `header` - The row containing the column names. Note that we start counting from 0.
* `usecols` - A string indicating the columns we want to include. We'll get the state, county, and median household income information.

In [None]:
median_income = pd.read_excel('../data/est18all.xls',
              sheet_name = 'est18ALL',
             header = 3,
             usecols = 'C,D,W')

In [None]:
median_income.head(2)

For this map, we only need the counties located in Tennessee.

In [None]:
median_income = median_income.loc[median_income['Postal Code'] == 'TN']
median_income.head(2)

We can remove the first row.

In [None]:
median_income = median_income.iloc[1:]
median_income.head(2)

Now, let's read in our counties shapefiles. This one was obtained from http://www.tngis.org/administrative-boundaries.htm

This creates a geopandas DataFrame, which is like a pandas DataFrame, but has geometry associated with it.

In [None]:
counties = gpd.read_file('../data/county/tncounty.shp')

In [None]:
counties.head()

The `geometry` column contains shapely Polygon or Multipolygon objects giving the boundaries of each county.

In [None]:
counties.loc[0, 'geometry']

In [None]:
print(counties.loc[0,'geometry'])

If we call `.plot()` and a GeoDataFrame, it will render a plot using the geometry column.

In [None]:
counties.plot();

If we want a larger plot, we can use `plt.subplots()` and set a figsize. When we create our plot, we need to specify that we want it to render on the axes we just created.

In [None]:
fig, ax = plt.subplots(figsize=(16,4))
counties.plot(ax = ax);

Since the axes are not conveying useful information, we can remove them.

In [None]:
fig, ax = plt.subplots(figsize=(16,4))
counties.plot(ax = ax)
ax.axis('off');

Now, we can merge the geoDataFrame with our median income dataframe.

In [None]:
counties.head(2)

In [None]:
median_income.head(2)

Some string manipulation so that the county name formats match.

In [None]:
median_income['NAME'] = median_income['Name'].str[:-7]

In [None]:
counties = pd.merge(left = counties,
                    right = median_income[['NAME', 'Median Household Income']])
counties.head()

To color our map based on a column, we can use the `column` argument.

In [None]:
fig, ax = plt.subplots(figsize=(16,4))
counties.plot(column = 'Median Household Income',
              ax = ax)
ax.axis('off');

Why does our map look like this?

In [None]:
counties.info()

It turns out that _pandas_ is treating the median income column as an object. We need to to realize that it is a numeric column."

In [None]:
counties['Median Household Income'] = pd.to_numeric(counties['Median Household Income'])

In [None]:
fig, ax = plt.subplots(figsize=(16,4))
counties.plot(column = 'Median Household Income',
              ax = ax)
ax.axis('off');

In [None]:
fig, ax = plt.subplots(figsize=(16,4))
counties.plot(column = 'Median Household Income', 
              edgecolor = 'black', 
              legend = True,
              ax = ax)
ax.axis('off');

By default, geopandas will use a continuous colorscale for the choropleth, which lead to a less than optimal map when you have a disproportionately large value, like we have with Williamson County. 

We can specify a different scheme to use. For example, let's use the [Jenks natural breaks classification method](https://en.wikipedia.org/wiki/Jenks_natural_breaks_optimization).

We'll also specify a different colormap using the `cmap` argument. A list of named colormaps is available at https://matplotlib.org/3.1.1/gallery/color/colormap_reference.html.

In [None]:
fig, ax = plt.subplots(figsize=(16,4))

counties.plot(column = 'Median Household Income', 
              edgecolor = 'black', 
              legend = True,
              cmap = 'Blues',
              scheme="NaturalBreaks",
              ax = ax)

# Position the legend
leg = ax.get_legend()
leg.set_bbox_to_anchor((1, 0.5))

# Add a title
plt.title('Median Household Income by County, 2018', fontsize = 18)

ax.axis('off');

**Warning: More advanced code below**

The default legend formatting could be improved. The following cells shows how we can do some advanced formatting to change the legend labels.

In [None]:
from matplotlib.lines import Line2D

fig, ax = plt.subplots(figsize=(16,4))

counties.plot(column = 'Median Household Income', 
              edgecolor = 'black',
              legend = True,
              cmap = 'Blues',
              scheme="NaturalBreaks",
              ax = ax)

leg = ax.get_legend()

# Adjust the formatting of the legend
labels = []
n = len(leg.get_texts())
for i, lbl in enumerate(leg.get_texts()):
    label_text = lbl.get_text()
    lower = float(label_text.split()[0][:-1])
    upper = float(label_text.split()[1][:-1])
    if i == 0:
        new_text = "Below " + "\${:,.0f}".format(upper + 1)
    elif i == n - 1:
        new_text = "Above " + "\${:,.0f}".format(lower)
    else:
        new_text = "\${:,.0f}".format(lower + 1) + " - " + "\${:,.0f}".format(upper)
        
    labels.append(new_text)

# Adjust the marker appearance
# Extract the old markers and then modify by setting the edgecolor and edgewidth
markers = []
for line in leg.get_lines():
    marker = Line2D([0],[0], marker = 'o', 
                    markersize = line.get_markersize(), 
                    color = line.get_markerfacecolor(),
                    linestyle = 'None',
                    markeredgecolor = 'black',
                    markeredgewidth = 1)
    markers.append(marker)

# Redraw the legend with the new labels and markers
plt.legend(markers, labels, fontsize = 12)
leg = ax.get_legend()
leg.set_bbox_to_anchor((1, 0.5))
    
plt.title('Median Household Income by County, 2018', fontsize = 18)

ax.axis('off');