Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape & Parse Placer County Files #17

Closed
jamesshannon opened this issue Nov 2, 2020 · 8 comments
Closed

Scrape & Parse Placer County Files #17

jamesshannon opened this issue Nov 2, 2020 · 8 comments

Comments

@jamesshannon
Copy link

Initial investigation for Placer:

GIS data overview page

Parcel information is in AddressPoints.csv. You'd think you'd want Parcels.csv but AddressPoints has a lat/lng centroid point while Parcels has some fields (like Shape__Area) which don't appear directly useful.

Tax info can be found at the URL: https://common3.mptsweb.com/MBC/placer/tax/main/__APN__/2020/0000 where __APN__ is the APN without -'s. E.g. https://common3.mptsweb.com/MBC/placer/tax/main/466120044000/2020/0000. Tax amount is found in the Totals - 1st and 2nd Installments section.

Originally posted by @jamesshannon in #1 (comment)

@jamesshannon
Copy link
Author

@typpo wrote:

@jamesshannon The Placer system looks identical to Yolo County's, which means much of the code can be reused! https://github.com/typpo/ca-property-tax/tree/master/scrapers/yolo

@jamesshannon
Copy link
Author

@typpo Thanks. That's helpful.

Where'd the file Yolo_County_Tax_Parcels_Open_Data.csv come from? I see in the README that there's a way to convert the gdb to geojson (which is used in the parser), but what about the CSV used in the scraper?

I ask because I've investigated the Placer AddressPoints and Parcels files a bit more. AddressPoints has APN and centroid, but I've found that:

  • There are about 80k rows with missing APNS where the addresses have APNs in the Parcels file
  • There are lots of rows with duplicate APNs. It seems this is due to multiple street addresses for the same parcel (e.g., addresses from cross streets)

So it seems better to use the Parcels file, but the CSV version doesn't have any useful-looking geodata. Both Parcels and AddressPoints have an object_id, but they don't seem to match. So I've started looking at ways to get geodata from non-CSV versions of the Parcels file. It appears I can download the shapefile and use a python package to get the shape and then shapely.geometry to find the centroid?

@typpo
Copy link
Owner

typpo commented Nov 2, 2020

I should have clarified - I think the input CSV for Yolo is different from Placer. It's just the tax system that appears to be the same, meaning I think we should be able to copy parts of the web scrape and parse steps (but not the same input file format).

I think that the Parcels file is the way to go. Although the spreadsheet doesn't have latlng info, if you download it as a shapefile and then convert it using ogr2ogr, it will include latlng info.

After downloading and unzipping the shapefiles, this command:

ogr2ogr -f GeoJSON placer.geojson Parcels.shp

Yields placer.geojson. Here's an example record from the file:

{ "type": "Feature", "properties": { "OBJECTID": 5, "APN": "471-340-027-000
", "TAX_DESC": "NORMAL OWNERSHIP", "USE_CD_N": "APARTMENTS, 4 UNITS OR MORE
", "STR_SQFT": 1076, "ADR1": "5043 MILLSTONE WAY", "ADR2": "GRANITE BAY CA
95746", "CITY": "GRANITE BAY", "STATE":
 "CA", "ZIP": "95746", "STREETNUM": "720", "STREETNAME": "SUNRISE", "STREETTYPE": "AV", "LANDVALUE": 9695, "STRUCTURE": 123898, "Shape__Are": 1051.560546875, "Shape__Len": 155.441730291059 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -121.272353678201995, 38.735262075994498 ], [ -121.272330313062994, 38.735261915944498 ], [ -121.272330259867999, 38.735267119240397 ], [ -121.272258341183004, 38.7352666546535 ], [ -121.272258460228002, 38.735253323802198 ], [ -121.272278634988993, 38.735253450342597 ], [ -121.272278881337996, 38.735232842775901 ], [ -121.272271623899996, 38.735232797256103 ], [ -121.272271802228005, 38.735194767554198 ], [ -121.272395469203005, 38.735195130536603 ], [ -121.272402235800001, 38.735195145943401 ], [ -121.272402182394998, 38.7352328890397 ], [ -121.272425318672006, 38.735233157550503 ], [ -121.27242491394, 38.735267575904402 ], [ -121.272353624586003, 38.735267320729399 ], [ -121.272353678201995, 38.735262075994498 ] ] ] } },

The list of latlngs defines a bounding box for the property, and we take the centroid. Many of the scrapers/parsers load an ogr-generated geojson file. Here's an example of loading the geojson file and here's an example of finding the centroid.

If you'd like to take this on, I'm happy to answer any other questions and support you! I've uploaded the converted Placer Parcels geojson here so you don't have to go through the trouble of installing ogr yourself: https://drive.google.com/file/d/1t7DpysdWdtJAry1lE4gesjzkuZs4t9n0/view?usp=sharing

@jamesshannon
Copy link
Author

jamesshannon commented Nov 4, 2020

Placer CSV file: xxxxx

I'm ready to upload the Placer script, but not sure how to isolate it from the sharedlib changes which I have merged into the branch for development. It'll probably work itself out after the sharelib branch is merged.

@jamesshannon
Copy link
Author

Hold off on that file... I'm validating it and seeing some issues.

@jamesshannon
Copy link
Author

Ok. File is correct now: https://drive.google.com/file/d/1QU5k5Il6GbzVT4r1NaGPPldgkqJDU495/view?usp=sharing

I created a quick script to validate the files. It does two things to check for programming errors and GIGO errors:

  • Checks that CSV has appropriate header columns
  • Checks that each lat/lng is within the CA-wide bounding box. Turns out that I had mixed up the centroid xy -> lat/long.

@typpo
Copy link
Owner

typpo commented Nov 8, 2020

Added! Sorry for the delay, the past week has been...distracting.

The validation script sounds very useful, I often mess things up the first time by flipping lng/lat

@typpo typpo closed this as completed Nov 8, 2020
@typpo
Copy link
Owner

typpo commented Nov 8, 2020

@jamesshannon How would you like to be credited on the site? Name + link to twitter or personal website?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants