# Understanding SB 50 in LA, Part 2
Let's pick up where we left off last time. We should have prepared all the data regarding the transportation-based criteria. Now we are going to move on to the two newer criteria for SB50: Job-Rich areas (an addition) and Sensitive Communities (a subtraction). After the first exercise, you should have the following files in your `data/processed/` directory:
```
data/                             
├── processed/  
│   ├── rail_stop_buff_wgs84.geojson   <- 1/2 mi buff around rail stations
│   └── hqbus_stop_buff_wgs84.geojson  <- 1/4 mi buff around HQ bus stops            
└── raw/         
```

In [None]:
# Import libraries
import pandas as pd
import geopandas as gpd
from ipyleaflet import Map, GeoData, basemaps, LayersControl

### Adding the Job-Rich Areas

A second part of the bill allowed for additional development "job rich" areas. What the legislation meant by "job rich" was not exactly clear; however, it was understood by insiders that Weiner's staff were considering a specific map within the _Mapping Opportunities in California_ project, specifically the view that contained the definition of "high-opportunity + jobs-rich, long in-commutes, and/or jobs-housing mismatch." You can find an interactive version of the map [here](http://mappingopportunityca.org/).  

However, if you take a look at the interactive map, there is not a readily-available link to download the underlying data. Might there be a way to find out where the data driving the map lives? Go ahead and examine the underlying code by right-clicking the map and selecting "Inspect". Dig through the HTML and look for the scripts labeled "js/getData.js" and "js/map.js", which _sound_ like they may have some clues to our data _(Hint: keep an eye out for 2 files: finalData.csv and a companion json file that will contain the spatial data)_.

Once you find where the data files are located, go ahead and add it to your project within the `data/raw` directory so we can use it for our project. However, before loading it, take a peek at the contents. You might notice that the format doesn't look quite the same as the GeoJSON that we've been using (there was also a hint in the .js code as well). Instead, these data are spatial, but they are TopoJSON data, which you can read more about [here](https://bost.ocks.org/mike/topology/).

Depending on your version of `fiona`, you might already have the driver installed and be able to read it directly into a GeoPandas dataframe using the following command: `jobs_shapes = gpd.read_file('data/raw/nodata.json', driver='TopoJSON')`. If not, make sure to update `fiona` to the latest version (1.8.5 or greater).

In [None]:
# Load data & set CRS (if not already set) to 4326
jobs_shapes = gpd.read_file('data/raw/nodata.json', driver='TopoJSON')
jobs_shapes.crs = {'init':'epsg:4326'}

Now that we've loaded our spatial layers, let's take a look at what we have. Using the example from Part 1 of this exercise, go ahead and display the contents of `geos` in an `ipyleaflet` map. You should see a map that somewhat resembles this:

![TopoMap](img/topo_mapping_opportunities.png)

In [None]:
# TODO: Create basemap, zoomed out a bit to CA
m = 

# TODO: Create the GeoData Object and add to the Map
jobs_gd = 

# Add the GeoData Object as Map Layer
m.add_layer(jobs_gd)

# Optional: Add layer control
m.add_control(LayersControl())

# Display the map
m

Now that we have the geographic boundaries, we need to perform additional manipulations:

1. **Filter for Los Angeles County**: Currently the data is for all of California. For our project, we only need LA County.
2. **Join with data**: We currently are mapping only the geography, not the data. We will need to join to `finalData.csv` 
3. **Reconstruct the appropriate view**: We are intersted in the specific view called "high-opportunity + jobs-rich, long in-commutes, and/or jobs-housing mismatch" that was identified in the _Mapping Opportunities in California_ map. We will need to figure out how to reconstruct that subset of the data (since there are other views in the map as well, all of which are derived from the same `finalData.csv`.

##### Step 1: Filter for LA County
Let's start by filtering those areas only in LA County. To apply this first filter, we are going to need a geographic boundary of LA County, which you can find [here](http://geohub.lacity.org/datasets/10f1e37c065347e693cf4e8ee753c09b_15). Write a command to query the API and save the save the result to our `data/raw` folder.

In [None]:
# TODO: Import requests & json packages

# TODO: Call the API to get the GeoJSON data, and save to 'data/raw'
url = 
resp = requests.get(url)

# Only move forward if there is a successful status code
if resp.status_code == requests.codes.ok:

    # Write out JSON to data/ or data/raw 
    with open('data/raw/lacounty_wgs84.geojson', 'w') as outfile:
        json.dump(resp.json(), outfile)

Now that we've saved our data, load it back into our current workspace. 
  
The [GeoPandas Documentation](http://geopandas.org/mergingdata.html) discusses two types of joins: attribute joins and spatial joins. Once we have our LA County boundary, we want to perform a GeoPandas _spatial join_ operation on the data, keeping all the geographies from `jobs_shapes` that are within the LA County boundary. Review the GeoPandas documentation on spatial joins [here](http://geopandas.org/mergingdata.html#spatial-joins). Confirm that the filter worked correctly by printing out the number of rows in the dataframe before and after the join.

In [None]:
# TODO: Load in the LA County Boundary data as GDF
lacounty_gdf = 

# TODO: Apply Spatial join to data
la_geo = 

# Print the lengths of the GDF before & after the join, confirming that rows have been dropped
print(f'There are {len(jobs_shapes)} rows in the pre-join GDF.')
print(f'There are {len(la_geo)} rows in the post-join GDF.')

Once you've confirmed that the post-join dataframe is smaller than the previous one, remove all the columns that were added from the LA County boundary file during the join. Once that is done, go ahead and save it to disk at `data/processed/la_geo_wgs84.geojson`.

In [None]:
# TODO: Only keep columns: 'id', 'fips', 'geometry'
la_geo = 

# TODO: Write filtered geometry to disk


##### Step 2: Join the Geography to Data file
Let's join our geography file to our data to create one GeoDataFrame containing both. In this case, we will be performing an attribute join (we previously did an _attribute join_ on the GDF), based on a unique identifier for each geometry object.

Let's begin by loading our `finalData.csv` into a Pandas DataFrame and inspecting the head. Let's also inspect the head of our `geos` GeoDataFrame and look for an ID value that we could use to join the two.

In [None]:
# TODO: Load finalData.csv from data/raw 
jobs_data = 

# Inspect the head of the DataFrame
jobs_data.head()

In [None]:
# TODO: Inspect the head of our la_geo GDF


You should be able to see a field that we can use for joining. However, those especially astute will notice a slight difference betweeen the two fields; one has a leading `0` while the other does not. We can fix this by using Python's [Zfill](https://python-reference.readthedocs.io/en/latest/docs/str/zfill.html) string method (in this case, let's keep both columns as strings, wthough we also could have converted both to numeric types). Go ahead and replace the problematic column with the extra preceeding `0`, preserving the column name. Then join both dataframes to create one unified GeoDataFrame.

In [None]:
# TODO: Zfill correct column (convert to str type if needed) & replace with original
jobs_data.fips = 

# TODO: Join data to GeoDataFrame
la_jobs_gdf = 

# Print a count of the length of (1) jobs_data, (2) jobs_shapes, and (3) jobs_gdf to confirm no dropped rows
print(f'Length of Shapes: {len(jobs_shapes)}')
print(f'Length of Data: {len(jobs_data)}')
print(f'Length of Merged DF: {len(la_jobs_gdf)}')

# TODO: Inspect the head of the new merged GDF


##### Step 3: Reconstruct the Scenario
As you can see from the [web map](https://mappingopportunityca.org/), The _Mapping Opportunities in California_ map has several different Scenarios:
* High-Opportunity
* High-Opportunity + Jobs-Rich
* High-Opportunity + Jobs-Housing Mismatch
* High-Opportunity + Long In-Commutes
* High-Opportunity + Jobs-Rich, Long In-Commutes, and/or Jobs-Housing Mismatch

We are interested only in the last scenario, which we believe would have been the basis for the "Jobs Rich" definition for SB50. Since our new merged GeoDataFrame contains all the data to construct any of those scenarios, we are going to want to apply a *filter* to get only those areas matching all conditions. Take a look at all the flag fields at the end of `jobs_gdf` and play around turning them on/off and then testing the output of the map until you match the [web map](https://mappingopportunityca.org/).

In [None]:
# TODO: Apply filters to data
la_jobs_filtered_gdf = 

# TODO: Create a new map object and add your filtered GDF (or reuse the one before)
#       and keep testing until you get the right set of filters on your data.
final_gd = 

# Add the GeoData Object as Map Layer
m.add_layer(final_gd)

m

Great. Now we've filtered to areas in Los Angeles County defined as "Jobs Rich". Now that we've applied all our filters and confirmed that everything looks good, go ahead and save our final data as a geojson file `lacounty_opp_wgs84.geojson` within the `data/processed` folder.

In [None]:
# TODO: Save to data/processed


## Exempt Areas
### Sensitive Communities
During the first proprosal of SB375, California Senator Scott Weiner received quite a bit of blowback from those who were worried that his proposal would lead to rapid gentrification of neighborhoods and the pushout of renters who could no longer afford their rents. To address these concerns, SB 50 included a provision that exempted certain areas from SB50 that may face adverse impacts by the bill. Sensitive Communities were defined by in the bill as:
* 'High Segregation & Poverty' or 'Low Resource' in the [TCAC Opportunity Maps](https://www.treasurer.ca.gov/ctcac/opportunity.asp)
* Areas with [CalEnviroScreen](https://oehha.ca.gov/calenviroscreen/report/calenviroscreen-30) scores in the top 25th percentile statewide

##### Step 1: Load & Filter TCAC Data
The TCAC data is provided as an excel data file that needs to be joined to census tract boundaries. Fortunately, we've already done the work of getting the census boundary file for LA County, saved as `data/processed/la_geo_wgs84.geojson`. Download the _2019 Statewide Summary Table_ from the TCAC Webpage and save the "LosAngeles" sheet as a CSV file to `data/raw/la_tcac.csv`. Then, load it back into our notebook and join it to our data. 

In [None]:
# TODO: Load the LA TCAC data (after downloading and converting to CSV)
la_tcac = 

# TODO: Perform attribute join of TCAC data to Census Tract boundaries
la_tcac_gdf = 

# Print rowcounts before & after merge
print(f'There are {len(la_geo)} rows in the pre-join LA County Census Tract File.')
print(f'There are {len(la_tcac_gdf)} rows in the post-join LA TCAC GDF.')

# TODO: Check the head of the merged GDF for final confirmation
la_tcac_gdf.head()

Sensitive communities are defined as those that have the either the designation of 'High Segregation and Poverty' or 'Low Resource'. Let's filter for those two labels in the 'Final Category' column of our GeoDataFrame and then save to disk as `data/processed/la_tcac_filtered_wgs84.geojson`.

In [None]:
# TODO: Filter GDF
la_tcac_filtered_gdf = 

# TODO: Print rowcounts before/after filter for confirmation

# TODO: Save to disk


##### Step 2: Load CalEnviroScreen Data
All of the data used in the calculation of the CalEnviroScreen scores can be found [here](https://oehha.ca.gov/calenviroscreen/report/calenviroscreen-30). There are a few different formats provided: Shapefile, ArcGIS GeoDatabase, Google Earth KML file, and Spreadsheet. Since we already have census tracts, we can join the values from the spreadsheet to our geojson census tracts. Download the `ces3results.xlsx` file, save the first sheet ('CES 3.0 2018 Update') as `data/raw/ces3results_2018update.csv`, then load it into our notebook and join with our LA Census Tract GeoJSON file.

In [None]:
# TODO: Load data into notebook
ces3results = 

# TODO: Join to la tracts
ces3_gdf = 

# TODO: Confirm the join by printing the rowcount and examining the head of the mereged GDF


Now that we have our GeoDataFrame with CalEnviroScreen data, let's go ahead and filter for the top CES 3.0 25 Percentile range, which is the critera for excluding areas from the impacts of SB50. This is especially easy since you'll notice there is already a Yes/No Field in the 'SB 535 Disadvantaged Community' Column. Let's filter for those areas that fall within this category.

In [None]:
# TODO: Filter for disadvantaged communities
ces3_filtered_gdf = 

# Print the rowcount before and after to confirm the filter
print(f'There are {len(ces3_gdf)} in the pre-filtered ces3 data.')
print(f'There are {len(ces3_filtered_gdf)} in the post-filtered ces3 data.')

Finally, let's write out our filtered dataset to `data/processed/la_ces3_filtered_wgs84.geojson`.

In [None]:
# TODO: Save to disk


### Very High Fire Hazard Severity Zones
SB50 also excludes those areas deemed by CALFIRE as being within a Very High Fire Hazard Severity Zone. CALFIRE includes maps and GIS information regarding these zones on [their website](https://osfm.fire.ca.gov/divisions/wildfire-prevention-planning-engineering/wildland-hazards-building-codes/fire-hazard-severity-zones-maps/). Scroll down to find the data specific to LA County, and download the GIS (Shapefile) files for both the State Responsibility Area and Local Responsibility Area. Save them both to your `data/raw` folder and then read them back in as GDF objects. _Hint: Make sure you select the right driver!_ 

In [None]:
# TODO: Read in both shapefiles
local_firehazard_gdf = 
state_firehazard_gdf = 

# TODO: Examine the head of one of the GDFs


Check the CRS of each of the files and re-project if needed. 

In [None]:
# Check CRS and make necessary conversions
print(local_firehazard_gdf.crs)
print(state_firehazard_gdf.crs)

# TODO: Reproject to 4326
local_firehazard_gdf = 
state_firehazard_gdf = 

We want to filter both for "Very High Fire Hazard". Start by printing the unique values for "HAZ_CLASS", and then filter by the appropriate one. Also, since for our purposes we do not care about responsibility, let's union both of the geometries into one GeoDataFrame and write it out to `data/processed/highfirehazard_wgs84.geojson`.

In [None]:
# Print unique values for HAZ_CLASS for each
print(state_firehazard_gdf.HAZ_CLASS.unique())
print(local_firehazard_gdf.HAZ_CLASS.unique())

# TODO: Filter for the appropriate HAZ Class
state_highfirehazard_gdf = 

# Print Record Counts
print(f'There were {len(state_firehazard_gdf)} records in the pre-filtered state firehazard gdf.')
print(f'There are {len(state_highfirehazard_gdf)} records in the post-filtered state firehazard gdf.')
print(f'There are {len(local_firehazard_gdf)} records in the local firehazard gdf.')

# TODO: Concatenate both GDFs
gdf_list = 
lacounty_highfirehazard_gdf = 

# Confirm that record count total = record count gdf1 + record count gdf2
print(f'The record count of the concat of both GDFs is {len(lacounty_highfirehazard_gdf)}.')

# TODO: Write out to data/processed


Let's go ahead and make a map to view the geometries we just created using the `ipyleaflet` library as we did earlier with the jobs-rich areas.

In [None]:
# TODO: Create a new map object and add your filtered GDF (or reuse the one before)
#       and keep testing until you get the right set of filters on your data.
m3 = 

hazard_gd = 

# Add the GeoData Object as Map Layer
m3.add_layer(hazard_gd)

m3

### To Be Continued...
We've now processed the transportation and non-tranpsortation criteria for consideration of SB50. In the next part, we will begin putting these pieces together. Stay tuned.