# Scatterplots in pydeck: A case study using Beijing subway stops

*(About 5-10 minutes to read)*

Since 1978, China's GDP has nearly doubled every seven years [[1](https://en.wikipedia.org/wiki/Historical_GDP_of_China)]. That sort of exponential growth has led to rapid internal change within the country&#8212;as demonstrated in part by the rapid changes within Beijing's urban infrastructure.

Below we'll plot the location of Beijing subway stops over time. Locations for subway stops come from [Wikipedia](https://en.wikipedia.org/wiki/List_of_Beijing_Subway_stations) and [OpenStreetMap](https://wiki.openstreetmap.org/wiki/Beijing_Subway). Do note that this is not intended to be a rigorous study, so some stops may be missing.

## Contents

- [Getting the data](#Getting-the-data)
- [Data cleaning](#Data-cleaning)
- [Using pydeck to automatically create a viewport](#Automatically-generate-a-viewport)
- [Plotting the data](#Plotting-the-data)
- [Plotting the data over time](#Playing-the-data-forward-in-time)

## Getting the data

First, we can use the [Pandas library](https://pandas.pydata.org/) to download our data. You're likely already familiar with its existence–Pandas is likely the most popular library in Python for filtering, aggregating, and joining data.

In [1]:
from ast import literal_eval

import pandas as pd
from pydeck import (
    data_utils,
    Deck,
    Layer
)

# First, let's use Pandas to download our data
URL = 'https://raw.githubusercontent.com/ajduberstein/data_sets/master/beijing_subway_station.csv'
df = pd.read_csv(URL)
df.head()

Unnamed: 0,lat,lng,osm_id,station_name,chinese_name,opening_date,color,line_name
0,39.940249,116.456359,1351272524,Agricultural Exhibition Center,农业展览馆,2008-07-19,"[0, 146, 188, 255]",Line 10
1,39.95557,116.388507,5057476994,Andelibeijie,安德里北街,2015-12-26,"[0, 155, 119, 255]",Line 8 (North section)
2,39.947729,116.402067,339088654,Andingmen,安定门,1984-09-20,"[0, 75, 135, 255]",Line 2
3,40.011026,116.263981,1362259113,Anheqiao North,安河桥北,2009-09-28,"[0, 140, 149, 255]",Line 4
4,39.967112,116.388398,5305505996,Anhuaqiao,安华桥,2012-12-30,"[0, 155, 119, 255]",Line 8 (North section)


## Data cleaning

Next, we'll have to engage in some necessary data housekeeping:

1. We have to re-code position to be one field and in a list.
2. The CSV encodes the `[R, G, B, A]` color values a `str`, and `literal_eval` lets us convert that string a list.

In [2]:
# We have to re-code position to be one field in a list, so we'll do that here:
df['position'] = df.apply(lambda x: [x['lng'], x['lat']], axis=1)
# The CSV encodes the [R, G, B, A] color values listed in it as a string
df['color'] = df.apply(lambda x: literal_eval(x['color']), axis=1)

## Automatically generate a viewport

pydeck features some nifty utilities for visualizing data, like an **automatic zoom using `data_utils.autocompute_viewport`** for 2D data sets.

We'll render the viewport, as well, just to verify that the visualization looks sensible.

In [3]:
# Use pydeck's data_utils module to fit a viewport to the data
viewport = data_utils.autocompute_viewport(points=df[['lng', 'lat']], view_proportion=0.9)
auto_zoom_map = Deck(layers=None, initial_view_state=viewport)
auto_zoom_map.show()

DeckGLWidget(json_input='{"initialViewState": {"bearing": 0, "latitude": 39.92295563963415, "longitude": 116.3…

Sure enough, we're centered to Beijing.

# Plotting the data

We'll render the data and use some Jupyter notebook functionality to provide a header with a year.

It's worth spending some time on each line, if you haven't seen the Layer object yet:

```python
scatterplot = Layer(
    'ScatterplotLayer',
    df,
    radius=500,
    get_fill_color='color',
    get_position='position')
```



**We can specify the layer type as the first argument, the data as the second, and the layer arguments as keywords.** **[`ScatterplotLayer`](https://github.com/uber/deck.gl/blob/master/docs/layers/scatterplot-layer.md)** is one of a list of layers available in the deck.gl core library. We'll also provide a header to list the year using some built-in Jupyter notebook tools.

As aside, for a list of other layers, see the [deck.gl documentation](https://github.com/uber/deck.gl/tree/master/docs/layers#deckgl-layer-catalog-overview). Remember that deck.gl is a JavaScript library and not a Python one, so the documentation may differ for some kinds of terminology and functionality (e.g., pydeck doesn't support passing functions as arguments but this is a common occurrence within deck.gl).

In [4]:
from IPython.core.display import display
import ipywidgets

year = 2019

scatterplot = Layer(
    'ScatterplotLayer',
    df,
    id='scatterplot-layer',
    radius=500,
    get_fill_color='color',
    get_position='position')
r = Deck(layers=[scatterplot], initial_view_state=viewport)
display_el = ipywidgets.HTML('<h1>' + str(year) + '</h1>')
display(display_el)
r.show()

HTML(value='<h1>2019</h1>')

DeckGLWidget(json_input='{"initialViewState": {"bearing": 0, "latitude": 39.92295563963415, "longitude": 116.3…

## Playing the data forward in time

Finally, we can loop through the data and see the dramatic development in Beijing since 1971, as demonstrated by subway stop opening dates.

In [5]:
import time
for y in range(1971, 2020):
    scatterplot.data = df[df['opening_date'] <= str(y)]
    year = y
    display_el.value = '<h1>' + str(year) + '</h1>'
    r.update()
    time.sleep(0.2)