Skip to content

Data schema and sourcing

mmmatthew edited this page May 13, 2018 · 6 revisions

Overview of data flow

  • fountain data is harvested off OpenStreetMap (OSM), WikiData (WD), and WikiMedia Commons (WM) on a regular basis
  • The data is consolidated to remove duplicates and packaged into an application-ready file format
  • Fountain data stored and served from the server on a city-by-city and language-by-language basis, e.g. fountains_ch-zh-en.json.

Fountain data Schema (proposal)

The fountain objects should have the following properties should be included in the data sets. In parantheses, the probable source of the data is indicated, in order of priority. Asterisks indicate properties that may not yet be completely included in the data source). Square brackets indicate array values, curly brackets indicate object values, italics are comments.

  • [coordinates] (OSM, WD) in WGS84
  • fountain name (WD, OSM) name to be displayed as a header for the fountain's description
  • [aka] (WD) other given names of the fountain
  • short description (WD, WM) short description of fountain. WikiMedia Categories often have good descriptions.
  • [construction year(s)] (WD)
  • [{artist(s)}] (WD, OSM) incl. link to artists' WD and Wikipedia page
  • artwork title (WD) some artists give their fountain a title
  • water source (WD) indicates whether spring water or normal water. Although the info is sometimes available in OSM, this is the wrong place since only visible information should be contained in OSM (ideally)
  • {operator} (WD*) operator (e.g. WVZ), including link to website and contact info. Same remark as for 'water source'
  • operator id (WD*) object identifier used by operator
  • WikiData id (WD, OSM)
  • OpenStreetMap id (OSM, WD)
  • {main image} (WM) incl. img url, copyright info, and legend
  • [{gallery images}] (WM) incl. img url, copyright info, and legend
  • potable (OSM) information either stated explicitly with keyword drinking_water, or implied implicitly for certain tags, e.g. man_made:drinking_fountain. Additionally, the legal tag indicates whether the water is guaranteed by the operator as potable
  • dog (OSM) whether the fountain is canine-accessible. OSM has the dog tag for this
  • tap (OSM) whether you can refill your bottle at the fountain. It is an implied property of certain OSM tags, e.g. manmade:water_tap
  • active season (OSM/WD*) indicates times of year in which the fountain is running
  • streetViewUrl (WD)
  • [{wikipedia links}] (WD) incl. all available languages
  • directions (WD) address for finding the object

Data source tags

OSM

  • man_made:drinking_fountain
  • amenity:drinking_water
  • amenity:water_point tap-type source
  • man_made:water_tap tap-type source
  • natural:spring*
  • amenity:watering_place* trough for animals

Asterisk-marked tags should be used in combination with tag 'drinking_water'

Data Query

OSM data can be queried using the Overpass Api, for example: https://www.overpass-api.de/api/interpreter?data=[out:json];(node[man_made=drinking_fountain](47.317,8.42,47.432,8.66);node[amenity=drinking_water](47.317,8.42,47.432,8.66););out%20meta;

not used

  • ~ waterway:water_point for boat refilling
  • man_made:water_well the excavation, but maybe sometimes misused

WikiData

  • Q483453: fountain
  • Q1630622: drinking fountain subclass of fountain
  • Q43483: water well often misused tag

Data collection and conflation diagram

Data conflation scheme https://www.lucidchart.com/documents/view/2f1ad22a-fa35-4ac0-8856-092db6a4d286

fountain config file

fountain data source config

https://github.com/mmmatthew/datablue/blob/master/src/config/fountains.sources.json

Questions

Questions for S Keller

  • Tips and contact for complex OSM queries
  • tool for combining OSM and WD data
  • best way to show users how to edit OSM data
  • how to store water_source (current solution kind of a hack)
  • active date range keyword
  • best data structure for wrangling (e.g. pandas)