Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand the map - add tax data for more counties #1

Open
typpo opened this issue Oct 14, 2020 · 20 comments
Open

Expand the map - add tax data for more counties #1

typpo opened this issue Oct 14, 2020 · 20 comments
Labels
help wanted Extra attention is needed

Comments

@typpo
Copy link
Owner

typpo commented Oct 14, 2020

See How to add a county in the README.

Up-to-date spreadsheet with status of each county: https://docs.google.com/spreadsheets/d/1Po5WNrADfJhO87xdHXWqRZDPydDAOH7vbppzsICLVXg/edit#gid=0

@typpo typpo added the help wanted Extra attention is needed label Oct 14, 2020
@edre
Copy link

edre commented Oct 14, 2020

I found some data for San Diego county.

Parcel data at: https://hub.arcgis.com/datasets/SANDAG::parcels-4
In the zip file is a .dbf file (convert to a csv with dbfdump). The location is given in x and y coordinates that need some unclear conversion to lat/lon.

Tax data query: https://www.sdttc.com/content/ttc/en/tax-collection/prior-year-tax-records.html?fiscal_year=2019-07-01%7C2020-06-30&q={APN}&x=0&y=0

@typpo
Copy link
Owner Author

typpo commented Oct 14, 2020

Hey @edre, thanks for the pointer. I believe the coordinate system is most likely California State Plane Zone 6 (reference) but I haven't tested it on the San Diego data.

I should have the coordinate conversion done soon because Contra Costa and several other counties also need it. Will let you know once I figure that out (at least latlng conversion can wait til parsing, it is not a blocker for scraping).

@typpo
Copy link
Owner Author

typpo commented Oct 17, 2020

An update on new integrations

Added:

  • Contra Costa added
  • Marin added
  • Sonoma added
  • Yolo added (thanks to @highleyman)

In progress:

  • Sacramento - I think @highleyman is running a scrape
  • Solano - if @miloconway's pull request is any indication
  • LA scrape 78% done - this caused me a bit of a heartbreak because the scrape completed last week but turned out the way the site handles sessions + concurrency meant some results were returned for the wrong parcel
  • San Bernadino scrape 36% done
  • San Diego - I've written a scraper, but their anti scraping is very restrictive and makes the process very slow. I don't think it's practical to scrape the data, so I've contacted the relevant office with the help of an SD local.

@typpo
Copy link
Owner Author

typpo commented Oct 22, 2020

Update on county integrations

Added:

  • Solano added (thanks @miloconway)
  • Sacramento added (thanks @highleyman) - data is live but waiting on pull request
  • San Bernadino added
  • LA is finally done

In progress:

@typpo
Copy link
Owner Author

typpo commented Oct 29, 2020

Good news @jakebayless - Napa was added by @miloconway in #11!

Other progress: SLO was added by @kevbuchanan, San Diego by @swingley

In progress: North County San Diego, Orange County (almost done)

@jakebayless
Copy link

Excellent. Ok. Let's dive into some more rural and ag counties... For anyone interested in coding this:

Butte County, also recently impacted by big fires, perhaps useful and timely to have tax info exposed.
Butte tax records are the same web app as Sonoma County (reference that for code). APN and year is handy in the URL:

https://common2.mptsweb.com/MBC/butte/tax/main/053022019000/2020/0000

...and MAN! it took some digging, but here is the AGOL Butte County Parcels layer:
https://services.arcgis.com/3t3QfTXFRFX44zo8/arcgis/rest/services/Butte_County_Parcels/FeatureServer

@miloconway
Copy link

Currently in the process of looking at Fresno (I'm picking by population), can take a look at that one next unless someone else would like to.

@jakebayless
Copy link

Is there a master list of the counties added/needed so I/we can be methodical about researching the relevant endpoints? Maybe a sheet in drive or something we can share edits?

@jamesshannon
Copy link

Initial investigation for Placer:

GIS data overview page

Parcel information is in AddressPoints.csv. You'd think you'd want Parcels.csv but AddressPoints has a lat/lng centroid point while Parcels has some fields (like Shape__Area) which don't appear directly useful.

Tax info can be found at the URL: https://common3.mptsweb.com/MBC/placer/tax/main/__APN__/2020/0000 where __APN__ is the APN without -'s. E.g. https://common3.mptsweb.com/MBC/placer/tax/main/466120044000/2020/0000. Tax amount is found in the Totals - 1st and 2nd Installments section.

@typpo
Copy link
Owner Author

typpo commented Nov 2, 2020

@jamesshannon The Placer system looks identical to Yolo County's, which means much of the code can be reused! https://github.com/typpo/ca-property-tax/tree/master/scrapers/yolo

@jamesshannon
Copy link

I'm splitting discussion of Placer County off to #17

@typpo
Copy link
Owner Author

typpo commented Nov 4, 2020

Data recently added:

In progress:

I've created a spreadsheet that summarizes the status of all California counties: link. The good news is that with 19 out of 58 counties, we are now covering 75% of the state's population.

@jamesshannon
Copy link

I'd recommend that the sheet have a column to describe the Scraper data. According to https://www.mptsweb.net/ they support 35 counties. There are probably some other shared systems. If you hadn't pointed me to Yolo I wouldn't have known that I could reuse their code. If the spreadsheet had mentioned that Yolo scraped from mptsweb.com then that would have probably helped?

I'm running the parser on Placer right now and should have a CSV in a few minutes. I notice that -- despite them being < 10 mb -- you don't have the CSVs checked into git. How shall I deliver to you?

Also, I submitted a draft PR for my shared library. I have one or two more changes to submit but it's basically done. It handles CSVs and shapefiles with only a few pieces of configuration. My placer scraper script is basically this:

DATA_DIR = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'data')

PARCELS_GEN = parcels.ParcelsShapefile('PL', 'APN', 'ADR1',
      parcels.centroidfn_from_shape(),
      os.path.join(DATA_DIR, 'Parcels.shp'))
PARCELS_GEN.valid_apn_pattern = r'^\d{3}-\d{3}-\d{3}-\d{3}$'

def scrape():
  scraper = scrapers.Scraper(PARCELS_GEN, DATA_DIR,
      'https://common3.mptsweb.com/MBC/placer/tax/main/{apn_clean}/2020/0000')
  scraper.request_unsuccessful_string = '<title>ERROR</title>'

  scraper.scrape()

def parse():
  parser = parsers.ParserMegabyte(PARCELS_GEN, DATA_DIR)
  parser.parse()

@typpo
Copy link
Owner Author

typpo commented Nov 4, 2020

@jamesshannon That's awesome. Thank you for your work on the generic scraper/parser, I've skimmed it but will take a more in-depth look as soon as I can. I've added a Notes column to the sheet.

The mptsweb site includes this handy graphic, which makes me a bit less worried about adding all those tiny rural counties:

We've been sending the processed CSVs by Drive/Dropbox. The data is small for individual counties but in aggregate it's a couple hundred mb gzipped, which is too large for git/github.

@jamesshannon
Copy link

FYI:

Kern County
Parcels File: https://geodat-kernco.opendata.arcgis.com/datasets/abe562bb259144a0a95e6b9899fd00b8_0
APN-based tax search: http://recorder.co.kern.ca.us/propertydetails.php?srctext=001020015&srctype=apn

The parcels file description says that:

Tax Roll Data is available in separate database tables, which can be joined to the feature class using the APN9 field as the SQL join key.

But I can't find this file.

Also, according to this page 2020 parcels is the most recent and they sell their GIS data, and the shapefile I linked to is 2019, so maybe it's an older public domain file?

@miloconway
Copy link

miloconway commented Nov 4, 2020 via email

@jamesshannon
Copy link

@typpo
Copy link
Owner Author

typpo commented Nov 8, 2020

Just added Placer & Kern

@jakebayless
Copy link

I'm trying to fill in some basic sleuth gaps in my spare moments for others to work with.
Here are the URLs for Mendocino Parcels as well as Tax records:
Parcel layer: https://gis.mendocinocounty.org/server/rest/services/Parcels_sde_pub/MapServer/6
Tax records (APN in the URL: https://www.co.mendocino.ca.us/tax/cgi-bin/pTaxFR2.pl?apn=02710109&street=&situsAddr2=

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants