<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#A-real-time-bus-tracker" data-toc-modified-id="A-real-time-bus-tracker-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>A real-time bus tracker</a></span><ul class="toc-item"><li><span><a href="#Import-libraries" data-toc-modified-id="Import-libraries-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Import libraries</a></span></li><li><span><a href="#Playing-with-time" data-toc-modified-id="Playing-with-time-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Playing with time</a></span></li><li><span><a href="#Import-data-from-metro-API" data-toc-modified-id="Import-data-from-metro-API-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Import data from metro API</a></span><ul class="toc-item"><li><span><a href="#Get-real-time-bus-locations-via-url-call" data-toc-modified-id="Get-real-time-bus-locations-via-url-call-1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Get real-time bus locations via url call</a></span></li><li><span><a href="#Convert-JSON-to-dataframe-and-add-a-timestamp" data-toc-modified-id="Convert-JSON-to-dataframe-and-add-a-timestamp-1.3.2"><span class="toc-item-num">1.3.2&nbsp;&nbsp;</span>Convert JSON to dataframe and add a timestamp</a></span></li><li><span><a href="#Adjust-the-time" data-toc-modified-id="Adjust-the-time-1.3.3"><span class="toc-item-num">1.3.3&nbsp;&nbsp;</span>Adjust the time</a></span></li><li><span><a href="#Create-an-empty-csv-file" data-toc-modified-id="Create-an-empty-csv-file-1.3.4"><span class="toc-item-num">1.3.4&nbsp;&nbsp;</span>Create an empty csv file</a></span></li><li><span><a href="#Append-data-to-csv" data-toc-modified-id="Append-data-to-csv-1.3.5"><span class="toc-item-num">1.3.5&nbsp;&nbsp;</span>Append data to csv</a></span></li></ul></li><li><span><a href="#Making-it-into-a-function" data-toc-modified-id="Making-it-into-a-function-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Making it into a function</a></span></li><li><span><a href="#Automate-it" data-toc-modified-id="Automate-it-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Automate it</a></span></li><li><span><a href="#Map-it" data-toc-modified-id="Map-it-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Map it</a></span><ul class="toc-item"><li><span><a href="#Plotly" data-toc-modified-id="Plotly-1.6.1"><span class="toc-item-num">1.6.1&nbsp;&nbsp;</span>Plotly</a></span></li><li><span><a href="#Kepler" data-toc-modified-id="Kepler-1.6.2"><span class="toc-item-num">1.6.2&nbsp;&nbsp;</span>Kepler</a></span></li></ul></li></ul></li></ul></div>

<div class="alert alert-danger">
This lab will be recorded.
</div>

# A real-time bus tracker

<img src='images/buses.gif' width=400>

The LA Metro API provides information on the exact location of buses that are currently in operation. Using this real-time bus data, we can generate maps that display the location of buses for any given route. Furthermore, we can utilize this feed to create an archive of bus location associated with their timestamps. By creating a script that continously collects this information, we can simulate bus movement in the city for further analysis.

## Import libraries

In [None]:
# libraries
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px

# to import and manipulate api/json data
import urllib.request 
import json

# to manipulate csv files
import csv
from csv import DictWriter

# to work with time
import datetime as dt
from datetime import datetime
import pytz
from pytz import timezone

## Playing with time

<img src='https://files.realpython.com/media/How-to-Use-the-Python-Time-Module_Watermarked.9de84ed793f2.jpg' width=400>

[[Source](https://realpython.com/python-time-module/)]

Time is not the easiest thing to work with in Python. There are surprisingly many complications (e.g. date/time datatypes, date/time display formats, timezones, etc) that make it difficult to work with. Below are some of the very basic constructs of the datetime library, along with the [pytz](https://pythonhosted.org/pytz/) library that allows you to define timezones.

In [None]:
# what time is it now?
now = datetime.now()
print(now)

In [None]:
# Now, but formatted nicely (no milliseconds)
now.strftime('%Y-%m-%d %H:%M:%S')

In [None]:
# Current time in UTC
now_utc = datetime.now(timezone('UTC'))
now_utc.strftime('%Y-%m-%d %H:%M:%S')

In [None]:
# what are the valid timezones?
for tz in pytz.all_timezones:
    print(tz)

In [None]:
# Current time in LA
now = datetime.now(timezone('US/Pacific'))
print(now.strftime('%Y-%m-%d %H:%M:%S'))

## Import data from metro API

- https://developer.metro.net/api/

### Get real-time bus locations via url call

- https://api.metro.net/agencies/lametro/routes/2/vehicles/

In [None]:
# api url for metro stops
metro_url = 'https://api.metro.net/agencies/lametro/routes/2/vehicles/'

# call the api and bring the data in
with urllib.request.urlopen(metro_url) as url:
    data = json.loads(url.read().decode())

In [None]:
# What does the data look like?
print(data)

### Convert JSON to dataframe and add a timestamp

Oddly, the data feed from metro does not include a timestamp. Instead, it includes a field `seconds_since_report` which indicates how long ago the data was from. It *forces* us to have to compute the actual time ourselves (more on this later).

In [None]:
# convert json data to dataframe
df = pd.json_normalize(data, 'items')
df

While the data provides a field `seconds_since_report`, it does not have a specific timestamp field. Let's create that and tack it onto our dataframe.

In [None]:
# get the current time in our timezone
now = datetime.now(timezone('US/Pacific')).strftime('%Y-%m-%d %H:%M:%S')

# add the current timestamp
df['time'] = now

# convert the field to a datetime datatype
df['time'] = pd.to_datetime(df['time'])

df.head()

### Adjust the time
Metro returns a field `seconds_since_report` indicating the lag from the data intake from the actual capture time. We can adjust our time column to reflect this delay. This can be done by using the timestamp field we created, and subracting the time difference in seconds.

In [None]:
# check to see if the time math works
df['time'] - pd.TimedeltaIndex(df['seconds_since_report'], unit='s')

In [None]:
# do the same thing but create a new field for the adjusted time
df['time_adjusted'] = df['time'] - pd.TimedeltaIndex(df['seconds_since_report'], unit='s')
df.sample(5)

### Create an empty csv file
While we can save the metro data to a dataframe, let us also simultaneously save the data in a .csv file. The .csv file will allow the file to live outside of this jupyter notebook.

- https://docs.python.org/3/library/csv.html

We first create an empty csv file with the headers from our metro data feed.

In [None]:
# name of csv file  
filename = "bus.csv"
    
# field names  
fields = list(df)

# writing to csv file ('w') 
with open(filename, 'w') as csvfile:  
    # creating a csv writer object  
    csvwriter = csv.writer(csvfile)  
        
    # writing the fields  
    csvwriter.writerow(fields)  

### Append data to csv

Check your JupyterHub file space and confirm that the file `bus.csv` has been created. Now, we will append the dataframe to the csv file.

In [None]:
# Open your CSV file in append mode ('a')
with open('bus.csv', 'a') as busfile: 
      
    # create a dictionary writer
    dictwriter = DictWriter(busfile, fieldnames=fields) 

    # loop through each row and add it to the csv file
    for index, row in df.iterrows():
        #Pass the dictionary as an argument to the writerow() 
        dictwriter.writerow(row.to_dict()) 
  
    #Close the file object 
    busfile.close()

## Making it into a function

We have now reviewed the entire workflow to do the following:

- call the metro API and retrieve the location of "live" buses on a given route
- convert the json data to a dataframe
- create a timestamp for the data
- adjust the time to accommodate the time lag reported by metro
- create a csv file of our data

Next, let's create a function that does all of the above, which will allow us to automate the retrieval of bus data at given time intervals.

In [None]:
def get_live_bus_data(df,route='2'):
    
    # api url for metro stops
    metro_url = 'https://api.metro.net/agencies/lametro/routes/'+route+'/vehicles/'

    try:

        # call the api and bring the data in
        with urllib.request.urlopen(metro_url) as url:
            data = json.loads(url.read().decode())

        # convert json data to temp dataframe, and add a timestamp
        df_new = pd.json_normalize(data, 'items')

        # add the timestamp
        now = datetime.now(timezone('US/Pacific'))
        df_new['time'] = now.strftime('%Y-%m-%d %H:%M:%S')

        # convert the field to a datetime datatype
        df_new['time'] = pd.to_datetime(df_new['time'])

        # adjust the time
        df_new['time_adjusted'] = df_new['time'] - pd.TimedeltaIndex(df_new['seconds_since_report'], unit='s')

        # append the new data to original dataframe
        df = df.append(df_new,ignore_index=True)

        # append the new data to csv file (add the bus route to the file name)
        with open('bus_'+route+'.csv', 'a') as busfile: 

            dictwriter = DictWriter(busfile, fieldnames=fields) 

            for index, row in df.iterrows():
                dictwriter.writerow(row.to_dict()) 

            busfile.close()
 
    except Exception as error:

        print('metro api call failed')
        print(error)
    
    return df

In [None]:
# call the function, and redefine the df with its results
df = get_live_bus_data(df)

In [None]:
# check the data
df.shape

In [None]:
# output the data
df

## Automate it

The beauty of coding is that we can automate our code. The following code cell uses the `time` library, along with a `while` loop to run our function every 10 seconds, 10 times. Of course, these numbers can be modified as you wish.

In [None]:
import time

# set a counter
i = 1

# how many times to run
runtimes = 10

# run every x seconds
rest = 10

# start the loop
while i < runtimes:
    # run this every x seconds
    time.sleep(rest)
    
    # timestamp to print out
    now = datetime.now(timezone('US/Pacific'))
    now = now.strftime('%Y-%m-%d %H:%M:%S')

    print(str(i) + ': ' + now)
    df = add_live_bus_data(df)
    i += 1


In [None]:
# check the data
df.shape

In [None]:
df.info()

In [None]:
df

## Map it

### Plotly

In [None]:
fig = px.scatter_mapbox(df,
                        lat='latitude',
                        lon='longitude',
                        mapbox_style="carto-positron",
                        color= 'id',
                        zoom=10
                       )
fig.show()

### Kepler

In [None]:
from keplergl import KeplerGl

In [None]:
# create the kepler map (feeding it a config file so that it visualizes it based on prior settings)
map = KeplerGl(height=600,width=800,config={
  "version": "v1",
  "config": {
    "visState": {
      "filters": [
        {
          "dataId": [
            "bus"
          ],
          "id": "yzpt6frz3",
          "name": [
            "time"
          ],
          "type": "timeRange",
          "value": [
            1612886276000,
            1612886287999.9995
          ],
          "enlarged": True,
          "plotType": "histogram",
          "yAxis": None
        }
      ],
      "layers": [
        {
          "id": "x9vz60j",
          "type": "point",
          "config": {
            "dataId": "bus",
            "label": "Point",
            "color": [
              18,
              147,
              154
            ],
            "columns": {
              "lat": "latitude",
              "lng": "longitude",
              "altitude": None
            },
            "isVisible": True,
            "visConfig": {
              "radius": 20.1,
              "fixedRadius": False,
              "opacity": 0.8,
              "outline": False,
              "thickness": 2,
              "strokeColor": None,
              "colorRange": {
                "name": "ColorBrewer Set1-6",
                "type": "qualitative",
                "category": "ColorBrewer",
                "colors": [
                  "#e41a1c",
                  "#377eb8",
                  "#4daf4a",
                  "#984ea3",
                  "#ff7f00",
                  "#ffff33"
                ]
              },
              "strokeColorRange": {
                "name": "Global Warming",
                "type": "sequential",
                "category": "Uber",
                "colors": [
                  "#5A1846",
                  "#900C3F",
                  "#C70039",
                  "#E3611C",
                  "#F1920E",
                  "#FFC300"
                ]
              },
              "radiusRange": [
                0,
                50
              ],
              "filled": True
            },
            "hidden": False,
            "textLabel": [
              {
                "field": None,
                "color": [
                  255,
                  255,
                  255
                ],
                "size": 18,
                "offset": [
                  0,
                  0
                ],
                "anchor": "start",
                "alignment": "center"
              }
            ]
          },
          "visualChannels": {
            "colorField": {
              "name": "id",
              "type": "integer"
            },
            "colorScale": "quantile",
            "strokeColorField": None,
            "strokeColorScale": "quantile",
            "sizeField": None,
            "sizeScale": "linear"
          }
        }
      ],
      "interactionConfig": {
        "tooltip": {
          "fieldsToShow": {
            "bus": [
              {
                "name": "route_id",
                "format": None
              },
              {
                "name": "heading",
                "format": None
              },
              {
                "name": "id",
                "format": None
              },
              {
                "name": "predictable",
                "format": None
              },
              {
                "name": "seconds_since_report",
                "format": None
              }
            ]
          },
          "compareMode": False,
          "compareType": "absolute",
          "enabled": True
        },
        "brush": {
          "size": 0.5,
          "enabled": False
        },
        "geocoder": {
          "enabled": False
        },
        "coordinate": {
          "enabled": False
        }
      },
      "layerBlending": "normal",
      "splitMaps": [],
      "animationConfig": {
        "currentTime": None,
        "speed": 1
      }
    },
    "mapState": {
      "bearing": 17.204081632653057,
      "dragRotate": True,
      "latitude": 34.03641872791707,
      "longitude": -118.3557812310633,
      "pitch": 51.96682037600022,
      "zoom": 11.451941083083048,
      "isSplit": False
    },
    "mapStyle": {
      "styleType": "dark",
      "topLayerGroups": {},
      "visibleLayerGroups": {
        "label": True,
        "road": True,
        "border": False,
        "building": True,
        "water": True,
        "land": True,
        "3d building": False
      },
      "threeDBuildingColor": [
        9.665468314072013,
        17.18305478057247,
        31.1442867897876
      ],
      "mapStyles": {}
    }
  }
})

In [None]:
# show the map
map.add_data(data=df,name='bus')
map