# Hackathon

Scenario:

<img src="images/justice.png">

You are a group of interns working for "Justice for All," a non-profit organization in Los Angeles. Ellen, the director, has called upon you, the resident spatial data scientists, to produce a report based on a .geojson file that your manager Kazu has compiled. Kazu downloaded the data from Social Explorer, cleaned it up, and merged it with a geojson file. Kazu's wife went into labor last night (they are expecting a baby girl!) so it is now up to you to continue where Kazu left off. Ellen has a board meeting in an hour and wants you to produce a report for her in a Google Doc with the following material:

Part 1:
- A series of preliminary stats/charts of the data
- Maps: 
  - Make sure to zoom in to Los Angeles (crop Catalina Island out)
  - A series of choropleth maps
  - A series of side-by-side choropleth maps that show meaningful differences (make sure to make the legends, i.e. bin breaks the same on both maps)
  - Produce several maps that show two data layers: One basemap choropleth, and another overlay that shows the top values of another variable with red boundaries
  
Part 2:
- Import Council District boundaries from the LA Data Portal
- Create a demographic profile for each Council District
   - hint: Do a spatial join to get census tracts that intersect with each council district
   - show only the census tracts for each council district
   - use a loop and/or function to minimize your code


## Import the geojson file (code cell provided)

In [None]:
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
gdf = gpd.read_file('data/acs2015_2019.geojson')

## Conduct a thorough exploration of the data, and answer the following questions:
   - what fields are included?
   - how many records and columns are there?
   - what does the data look like?
   - are the data types correct?

## Conduct preliminary statistical analysis on select fields of interest
   - what are the means/medians of your variables of interest?
   - what are the top 10 values of each?

## Create meaningful histograms

- use `plt.hist`
   
Challenge:
- add a vertical line for the mean and median
- change the size of the plot
- change the colors of the bins

# Maps

## Create a single choropleth map with a variable of your choice 

- Make it big
- Zoom in (don't show Catalina Island)

## Create a side-by-side map

- Zoom in (sorry Catalina)
- Make the breakdowns the same between both maps

Example arguments to make custom breakdowns
```python
gdf.plot(ax=ax[0],
         column='% Total Population: White Alone',
         legend=True,
         scheme='user_defined', 
         classification_kwds={'bins':[20,40,60,80,100]},
         cmap='Greens'
        )
```

## Create a single map with two layers

- Make one variable the "base" choropleth map
- Overlay another variable, only showing the boundary outlines that match a particular query
- Make sure the map tells a story

Sample arguments to make your second overlay:

```python
    alpha=1,
    linewidth=1,
    hatch="////",
    facecolor="none", 
    color='red'
```

## Part 2: Council District Maps

- Download [council districts geojson file](https://data.lacity.org/City-Infrastructure-Service-Requests/Council-Districts/5v3h-vptv) from LA Open Data Portal
- Choose a select number of indicators and create choropleth maps for each council district
- Conduct a spatial join to select only the census tracts that intersect with the council district. Example:

```python
cd1_tracts = gpd.sjoin(gdf,gdf_cd[gdf_cd['name']=='1'])
```
- Create a function and a loop to minimize your code

In [None]:
# Council Districts
gdf_cd = gpd.read_file('data/Council Districts.geojson')