Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
128 lines (100 sloc) 6.22 KB

VOEventDB

and

Sustainable Software

Tim Staley / 4pisky.org

Hotwiring the Transient Universe V Villanova, PA, Oct 2016

Intro

This talk covers our recently published archiving and query tool VOEventDB, and also uses examples from the development of VOEventDB to touch on the much broader topic of sustainable software.

Sustainable software essentially means making software reusable. It's a set of goals to bear in mind - no project is perfect, and there's a cost/reward trade-off - there's little point writing extensive documentation for your 10 line shell script.

VOEventDB, in brief

Context

  • VOEvent is a standardised format for astronomical transient alerts.
  • NASA-GCN have been transmitting alerts in this format for over 2 years.
  • Previously, there was no public archive for alerts in this format.
  • VOEvent standard has always referred to a 'registry' of 'repositories' - clear gap to fill.

VOEventDB: Spec

  • Store raw VOEvent XML, provide XML content at a persistent URL
  • Store a common subset of VOEvent metadata in regular database schema
  • Make queries based on this common subset
  • Including spatial (cone-search) and citation-based queries
  • 'RESTful' web-API
  • Python client-library for remote-queries
  • Store raw XML - same data as you'd receive in real-time
  • The VOEvent Schema has some flexibility, but there's a core subset we expect to see in all packets - use that for filtering and searching.

Additional goals for VOEventDB: reusable, decentralized

  • Agnostic about inputs and outputs
  • Easy for any team to set up their own local repository

Implementation

  • Postgres + SQLAlchemy
  • Spatial queries powered by qc3 Postgres extension.
  • Flask-powered RESTful interface
  • Partially-autogenerated documentation.
  • Extensive test-suite using pytest fixtures.

NB, the VOEventDB web-interface is very bare-bones - designed to be just usable enough for developers to test it manually, but it's not user friendly. I'm expecting that casual users will use the Python client-library. Extensive examples can be found at http://voeventdbremote.readthedocs.io/.

Notes on sustainability

Packaging

  • Encourages re-use as a component
  • Removes 'install friction': just add a package to your requirements list
  • Adoption has historically been slowed due to fragmented ecosystem, lack of good docs.
  • Good, short, up-to-date tutorial on packaging your code: http://python-packaging.readthedocs.io/

There's one snag to writing a setup.py to turn your code into a package. However, the version number has to manually edited every time you make a new release, which is easy to forget. We already have a version control system - Git! Let's use it...

Package Versioning

Versioneer:

  • Adds a standalone Python module to your codebase
  • Automatically sets version number according to most recent git-tag
  • Git commit-id also available as a string in your library.
  • Super convenient, keeps everything in sync

Documentation

Minimal docs:

  • Description of what your package does (+ links for context!)
  • One or two brief usage examples
  • One big README is typically fine

Extended docs:

Read The Docs:

  • Free hosting for Sphinx-generated documentation
  • Links to a Github repository
  • Every git-push results in a new documentation build
  • API documentation is semi-automated - write docs next to the code
  • See also sphinx-napoleon (nicer formatting).

Documenting examples

  • Examples are very useful... until the code changes and they go stale
  • Python notebooks are a great format for writing examples - but tricky to publish.

Documenting examples with nbsphinx & RTD

  • nbsphinx lets you generate docs from notebooks.
  • The notebooks are re-run with every docs-build - so if the examples are broken, you'll notice.
  • This is how the voeventdb client-docs are generated.

Deployment & Hosting

For multi-component systems, deployment details are crucial. If you're writing a basic Python library, you don't need to think about this - document it, package it, test it, you're done. However... if you're releasing a package which works with a database, or is served through a web-interface, or talks to other custom code... you are a cruel, cruel person if you don't at least document your install process. The best way to document your install process is to automate it! VOEventDB deployments scripted with Ansible, and are all open-source! There's no getting around the fact that there is a setup overhead - the first time you automate your install it will take days. But then new server setups take minutes, rather than hours

VOEventDB

  • Provides a 'turn-key' queryable repository for transient alerts.
  • Can be used as a remote service
  • Or run your own
  • Overview paper: arXiv:1606.03735

Packaging

  • Make use of your packaging ecosystem
  • Think about use of your code as a component
  • Keep versioning information in your version control system! - automate package versioning

Documentation

  • Minimum: description + example usage + install requirements
  • Documentation goes stale - test your examples
  • In Python, notebooks are a great format for this - try nbsphinx!

Deployment

  • Docs are a start, but easily go out of date
  • Automate!

Links: