# Hackathon: Cheat sheet

Scenario:

<img src="images/justice.png">

You are a group of interns working for "Justice for All," a non-profit organization in Los Angeles. Ellen, the director, has called upon you, the resident spatial data scientists, to produce a report based on a .geojson file that your manager Kazu has compiled. Kazu downloaded the data from Social Explorer, cleaned it up, and merged it with a geojson file. Kazu's wife went into labor last night (they are expecting a baby girl!) so it is now up to you to continue where Kazu left off. Ellen has a board meeting in an hour and wants you to produce a report for her in a Google Doc with the following material:

Part 1:
- A series of preliminary stats/charts of the data
- Maps: 
  - Make sure to zoom in to Los Angeles (crop Catalina Island out)
  - A series of choropleth maps
  - A series of side-by-side choropleth maps that show meaningful differences (make sure to make the legends, i.e. bin breaks the same on both maps)
  - Produce several maps that show two data layers: One basemap choropleth, and another overlay that shows the top values of another variable with red boundaries
  
Part 2:
- Import Council District boundaries from the LA Data Portal
- Create a demographic profile for each Council District
   - hint: Do a spatial join to get census tracts that intersect with each council district
   - show only the census tracts for each council district
   - use a loop and/or function to minimize your code


## Import the geojson file (code cell provided)

In [None]:
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
gdf = gpd.read_file('data/acs2015_2019.geojson')

## Conduct a thorough exploration of the data, and answer the following questions:
   - what fields are included?
   - how many records and columns are there?
   - what does the data look like?
   - are the data types correct?

In [None]:
list(gdf)

In [None]:
gdf.shape

In [None]:
gdf.head()

In [None]:
gdf.info()

## Conduct preliminary statistical analysis on select fields of interest
   - what are the means/medians of your variables of interest?
   - what are the top 10 values of each?

In [None]:
indicators = ['% Population 25 Years and Over: Less than High School',
 '% Population 25 Years and Over: High School Graduate (Includes Equivalency)',
 '% Population 25 Years and Over: Some College',
 "% Population 25 Years and Over: Bachelor's Degree",
 "% Population 25 Years and Over: Master's Degree",
 '% Population 25 Years and Over: Professional School Degree',
 '% Population 25 Years and Over: Doctorate Degree',]

In [None]:
for indicator in indicators:
    print ('mean for ' + indicator + ' is ' + str(gdf[indicator].mean()))

In [None]:
for indicator in indicators:
    print(indicator)
    print (gdf.sort_values(by = indicator, ascending=False)[indicator].head(10))

## Create meaningful histograms

- use `plt.hist`
   
Challenge:
- add a vertical line for the mean and median
- change the size of the plot
- change the colors of the bins

In [None]:
def get_histogram(column = '% Population 25 Years and Over: Less than High School'):
    series_to_plot=gdf[column]

    plt.figure(figsize=(10,5))

    plt.hist(series_to_plot,bins=50,color='skyblue')

    plt.axvline(series_to_plot.mean(), color='k', linestyle='dashed', linewidth=1)
    plt.axvline(series_to_plot.median(), color='r', linestyle='dashed', linewidth=1)
    min_ylim, max_ylim = plt.ylim()
    plt.text(series_to_plot.mean()*1.1, max_ylim*0.9, 'Mean: {:.2f}'.format(series_to_plot.mean()))
    plt.text(series_to_plot.median()*1.1, max_ylim*0.8, 'Median: {:.2f}'.format(series_to_plot.median()),color='r')
    plt.title(column + ' in Los Angeles County')


In [None]:
indicators = [ '% Population 25 Years and Over: Less than High School',
 '% Population 25 Years and Over: High School Graduate (Includes Equivalency)',
 '% Population 25 Years and Over: Some College',
 "% Population 25 Years and Over: Bachelor's Degree",
 "% Population 25 Years and Over: Master's Degree",
 '% Population 25 Years and Over: Professional School Degree',
 '% Population 25 Years and Over: Doctorate Degree',]

In [None]:
for indicator in indicators:
    get_histogram(column=indicator)

# Maps

## Create a single choropleth map with a variable of your choice 

- Make it big
- Zoom in (don't show Catalina Island)

In [None]:
def get_map(column='% Population 25 Years and Over: Doctorate Degree'):
    ax = gdf.plot(figsize=(10,10),
                  column=column,
                  legend=True,
                  vmin=0,
                  vmax=100,
                  cmap='hot')
    ax.set_ylim(33.6,34.9)
    ax.set_title(column, fontsize=14)
    ax.axis('off');

In [None]:
for indicator in indicators:
    get_map(indicator)

## Create a side-by-side map

- Zoom in (sorry Catalina)
- Make the breakdowns the same between both maps

Example arguments to make custom breakdowns
```python
gdf.plot(ax=ax[0],
         column='% Total Population: White Alone',
         legend=True,
         scheme='user_defined', 
         classification_kwds={'bins':[20,40,60,80,100]},
         cmap='Greens'
        )
```

In [None]:
column1 = '% Total Population: White Alone'
column2 = '% Total Population: Black or African American Alone'
fig,ax = plt.subplots(1,2,figsize=(15,8))

gdf.plot(ax=ax[0],
         column=column1,
         legend=True,
         scheme='user_defined', 
         classification_kwds={'bins':[20,40,60,80,100]},
         cmap='plasma'
        )

ax[0].set_ylim(33.6,34.9)
ax[0].set_title(column1, fontsize=14)
ax[0].axis('off');

gdf.plot(ax=ax[1],
         column=column2,
         legend=True,
         scheme='user_defined', 
         classification_kwds={'bins':[20,40,60,80,100]},
         cmap='plasma'
        )

ax[1].set_ylim(33.6,34.9)
ax[1].set_title(column2, fontsize=14)
ax[1].axis('off');



## Create a single map with two layers

- Make one variable the "base" choropleth map
- Overlay another variable, only showing the boundary outlines that match a particular query
- Make sure the map tells a story

Sample arguments to make your second overlay:

```python
    alpha=1,
    linewidth=1,
    hatch="////",
    facecolor="none", 
    color='red'
```

In [None]:
gdf['% Population 15 Years and Over: Divorced'].describe()

In [None]:
column1 = '% Total Population: White Alone'
column2 = '% Population 15 Years and Over: Divorced'
fig,ax = plt.subplots(figsize=(15,15))

gdf.plot(ax=ax,
         column=column1,
         legend=True,
         scheme='user_defined', 
         classification_kwds={'bins':[20,40,60,80,100]},
         cmap='Greens'
        )

gdf[gdf['% Population 15 Years and Over: Divorced'] >= 20].boundary.plot(ax=ax,
        alpha=0.5,
        linewidth=2,
        hatch="///",
        color='red'
        )

ax.set_ylim(33.6,34.9)
ax.set_title(column1 + ' (green)\n' + '20% or more of population divorced (red)', fontsize=14)
ax.axis('off');



## Part 2: Council District Maps

In [None]:
# Council Districts
gdf_cd = gpd.read_file('data/Council Districts.geojson')

## Do a spatial join to get the census tracts inside of CD1

In [None]:
gdf_cd.info()

In [None]:
gdf_cd

In [None]:
# function to create a council district map
def cd_map(district = '1', column = '% Total Population: Hispanic or Latino'):
    # this cd
    this_cd = gdf_cd[gdf_cd['district']==district]
    
    # spatial join to get tracts
    tracts = gpd.sjoin(gdf,this_cd)

    # plot it
    fig,ax = plt.subplots()

    # map
    tracts.plot(ax=ax,
                column=column, 
                vmin=0,
                vmax=100,
                legend=True)

    ax.axis('off')
    ax.set_title('Council District ' + district + '\n(' + column + ')', fontsize=14)

In [None]:
cd_map()

In [None]:
for index, row in gdf_cd.iterrows():
    cd_map(district = row['district'])

In [None]:
list(gdf)

In [None]:
indicators = [ '% Total Population: Not Hispanic or Latino',
 '% Total Population: Not Hispanic or Latino: White Alone',
 '% Total Population: Not Hispanic or Latino: Black or African American Alone',
 '% Total Population: Not Hispanic or Latino: American Indian and Alaska Native Alone',
 '% Total Population: Not Hispanic or Latino: Asian Alone',
 '% Total Population: Not Hispanic or Latino: Native Hawaiian and Other Pacific Islander Alone',
 '% Total Population: Not Hispanic or Latino: Some Other Race Alone',
 '% Total Population: Not Hispanic or Latino: Two or More Races',
 '% Total Population: Hispanic or Latino']

In [None]:
for indicator in indicators:
    cd_map(district='1',column=indicator)