caching and compression for socrata open data portals
JavaScript CSS Shell HTML Nginx
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
conf
ids/data.cityofnewyork.us
misc
site
util
.gitignore
Dockerfile
Gruntfile.js
README.md
bower.json
build.sh
exec.sh
git-update.sh
package.json
run.sh

README.md

Socrata's open data portal doesn't support gzip compression of bulk file downloads. It also tends to be very slow in serving these large downloads, as if a large, slow, SELECT * FROM ... was sitting between you and your download...

The solution

   ___                   ____        _         ____           _
  / _ \ _ __   ___ _ __ |  _ \  __ _| |_ __ _ / ___|__ _  ___| |__   ___
 | | | | '_ \ / _ \ '_ \| | | |/ _` | __/ _` | |   / _` |/ __| '_ \ / _ \
 | |_| | |_) |  __/ | | | |_| | (_| | || (_| | |__| (_| | (__| | | |  __/
  \___/| .__/ \___|_| |_|____/ \__,_|\__\__,_|\____\__,_|\___|_| |_|\___|
       |_|

Basically, we do the compression and caching Socrata's open data portals don't do.

(Socrata) --> (nginx gzip) --> (AWS S3) --> (you)

Take a look

An Opendatacache is available already at http://www.opendatacache.com. Some example URLs:

Deploying

Using docker

You should build the image locally, then you can run it:

$ ./build.sh
$ WARM=1 APP_TOKEN=[Your Socrata App Token] ./run.sh

OpenDataCache makes a lot of API requests, so you'll need to sign up for an application token.

A note on cache warming

If you don't specify WARM=1 as above, the image will start & serve, but will not cache any new datasets. Its listings will be based on prior caching progress in the log/ folder.

Thanks

Thank you to OpenPrism for its list of Socrata portals, which is here.

TODO

  • Licensing