<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#A-real-time-bus-tracker" data-toc-modified-id="A-real-time-bus-tracker-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>A real-time bus tracker</a></span><ul class="toc-item"><li><span><a href="#Import-libraries" data-toc-modified-id="Import-libraries-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Import libraries</a></span></li><li><span><a href="#Playing-with-time" data-toc-modified-id="Playing-with-time-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Playing with time</a></span></li><li><span><a href="#Import-data-from-metro-API" data-toc-modified-id="Import-data-from-metro-API-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Import data from metro API</a></span><ul class="toc-item"><li><span><a href="#Get-real-time-bus-locations-via-url-call" data-toc-modified-id="Get-real-time-bus-locations-via-url-call-1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Get real-time bus locations via url call</a></span></li><li><span><a href="#Convert-JSON-to-dataframe-and-add-a-timestamp" data-toc-modified-id="Convert-JSON-to-dataframe-and-add-a-timestamp-1.3.2"><span class="toc-item-num">1.3.2&nbsp;&nbsp;</span>Convert JSON to dataframe and add a timestamp</a></span></li><li><span><a href="#Adjust-the-time" data-toc-modified-id="Adjust-the-time-1.3.3"><span class="toc-item-num">1.3.3&nbsp;&nbsp;</span>Adjust the time</a></span></li><li><span><a href="#Create-an-empty-csv-file" data-toc-modified-id="Create-an-empty-csv-file-1.3.4"><span class="toc-item-num">1.3.4&nbsp;&nbsp;</span>Create an empty csv file</a></span></li><li><span><a href="#Append-data-to-csv" data-toc-modified-id="Append-data-to-csv-1.3.5"><span class="toc-item-num">1.3.5&nbsp;&nbsp;</span>Append data to csv</a></span></li></ul></li><li><span><a href="#Making-it-into-a-function" data-toc-modified-id="Making-it-into-a-function-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Making it into a function</a></span></li><li><span><a href="#In-class-exercise" data-toc-modified-id="In-class-exercise-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>In class exercise</a></span></li><li><span><a href="#Automate-it" data-toc-modified-id="Automate-it-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Automate it</a></span></li><li><span><a href="#Map-it" data-toc-modified-id="Map-it-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Map it</a></span><ul class="toc-item"><li><span><a href="#Plotly" data-toc-modified-id="Plotly-1.7.1"><span class="toc-item-num">1.7.1&nbsp;&nbsp;</span>Plotly</a></span></li><li><span><a href="#Kepler" data-toc-modified-id="Kepler-1.7.2"><span class="toc-item-num">1.7.2&nbsp;&nbsp;</span>Kepler</a></span></li></ul></li></ul></li></ul></div>

<div class="alert alert-danger">
This lab will be recorded.
</div>

# A real-time bus tracker

<img src='images/buses.gif' width=400>

The LA Metro API provides information on the exact location of buses that are currently in operation. Using this real-time bus data, we can generate maps that display the location of buses for any given route. Furthermore, we can utilize this feed to create an archive of bus location associated with their timestamps. By creating a script that continously collects this information, we can simulate bus movement in the city for further analysis.

## Import libraries

In [None]:
# libraries
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import plotly.express as px

# to import and manipulate api/json data
import urllib.request 
import json

# to manipulate csv files
import csv
from csv import DictWriter

# to work with time
import datetime as dt
from datetime import datetime
import pytz
from pytz import timezone

## Playing with time

<img src='images/World_Time_Zones_Map.png' width=800>

[[Source: Wikimedia](https://upload.wikimedia.org/wikipedia/commons/8/88/World_Time_Zones_Map.png)]

Time is not the easiest thing to work with in Python, or for that matter, any coding language. There are surprisingly many complications (e.g. date/time datatypes, date/time display formats, timezones, etc) that make it difficult to work with. Below are some of the very basic constructs of the datetime library, along with the [pytz](https://pythonhosted.org/pytz/) library that allows you to define timezones.

Time formats as strings:

- https://docs.python.org/3/library/time.html#time.strftime
    

In [None]:
# giving the format of datetime
format = "%Y-%m-%d %H:%M:%S %Z%z"

# getting the current time in UTC timezone
now_utc = datetime.now(timezone('UTC'))

# Format the above DateTime using the strftime()
print('Current Time in UTC TimeZone:',now_utc.strftime(format))

# Converting to Asia/Tokyo time zone
now_tokyo = now_utc.astimezone(timezone('Asia/Tokyo'))

# Format the above datetime using the strftime()
print('Current Time in Asia/Tokyo TimeZone:',now_tokyo.strftime(format))

In [None]:
# what are the valid timezones?
for tz in pytz.all_timezones:
    print(tz)

In [None]:
# Using the logic above, output the time in LA right now



In [None]:
# How about other time formats, and/or timezones from around the world?



# Import data from metro API

![metro](https://i0.wp.com/brand.metro.net/wp-content/uploads/2018/08/metro-logo-preferred-e1543619521338.png?h=75)

- https://api.metro.net/

### Get real-time bus locations via url call

Note that the metro data is provided by our hardworking alumni friend Albert!

- https://api.metro.net/LACMTA/vehicle_positions/route_code/20?geojson=true

![Alt text](images/albert%20and%20yoh%20texts.png)

Need help choosing a metro line?

- https://www.metro.net/riding/schedules/

In [None]:
# metro line to scrape
line = 4

# get the data in geojson format
gdf = gpd.read_file('https://api.metro.net/LACMTA/vehicle_positions/route_code/'+str(line)+'?geojson=true')


In [None]:
gdf.plot()

## Data cleanup

### Converting JSON to columns and converting unix timestamps
Some of the data fields are in ```json``` format, or ```dictionaries``` if we were to use python lingo. We need to "explode" these into separate columns. For example, to explode the "trip" column:

In [None]:
gdf.head()

In [None]:
# "explode" trip json data in trip column to separate columns
gdf = pd.concat([gdf, gdf["trip"].apply(pd.Series)], axis=1)
gdf

Now, repeat for ```vehicle``` and ```position``` columns:

In [None]:
gdf = pd.concat([gdf, gdf["vehicle"].apply(pd.Series)], axis=1)
gdf = pd.concat([gdf, gdf["position"].apply(pd.Series)], axis=1)

In [None]:
# drop unncessary columns
gdf = gdf.drop(columns=['trip','vehicle','position','trip_id','trip_start_date','vehicle_label'])
gdf

### Converting timestamps

Whoa, what's up with the timestamp column? Does anybody recognize this time format?

In [None]:
gdf['timestamp']

In [None]:
# convert unix timestamp to human readable time
gdf['time'] = pd.to_datetime(gdf['timestamp'],unit='s')

In [None]:
# convert utc time to Los Angeles time
gdf['time'] = gdf['time'].dt.tz_localize('UTC').dt.tz_convert('US/Pacific')
gdf.head()

### Create an empty csv file
While we can save the metro data to a dataframe, let us also simultaneously save the data in a .csv file. The .csv file will allow the file to live outside of this jupyter notebook.

Python allows you to "write," or create a new csv file to your directory.

- https://docs.python.org/3/library/csv.html

We first create an empty csv file with the headers from our metro data feed.

In [None]:
# name of csv file  
filename = "bus.csv"
    
# field names to use as headers
fields = list(gdf)

fields

In [None]:
# write/create a new csv file ('w') 
with open(filename, 'w') as csvfile:  
    # creating a csv writer object  
    csvwriter = csv.writer(csvfile)  
        
    # writing the fields  
    csvwriter.writerow(fields)  

You should now see a new file `bus.csv` in your file directory. Check to make sure!

### Append data to csv

Check your JupyterHub file space and confirm that the file `bus.csv` has been created. Now, we will append the dataframe to the csv file.

In [None]:
# Open your CSV file in append mode ('a')
with open('bus.csv', 'a') as busfile: 
      
    # create a dictionary writer
    dictwriter = DictWriter(busfile, fieldnames=fields) 

    # loop through each row and add it to the csv file
    for index, row in gdf.iterrows():
        #Pass the dictionary as an argument to the writerow() 
        dictwriter.writerow(row.to_dict()) 
  
    #Close the file object 
    busfile.close()

You csv file has been baked! Double click the file from your directory and confirm that the bus data is there (you might have to close your file once and reopen to see the changes)

## Making it into a function

We have now reviewed the entire workflow to do the following:

- call the metro API and retrieve the location of "live" buses on a given route
- convert the json data to a dataframe
- create a timestamp for the data and adjust the timezone for Los Angeles
- create a csv file of our data

Next, let's create a function that does all of the above, which will allow us to automate the retrieval of bus data at given time intervals.

## In class exercise

- create a function `scrape_metro()` that accepts an argument for `line` (with the default being `line = '2'`
- add all the relevant steps (as shown in the code cells above) that produces the csv file with the adjusted timestamp
- in another cell, run the function `scrape_metro()`
- open the `bus.csv` file to make sure new rows have been added

Perform these steps in the cells below. DO NOT SCROLL DOWN UNTIL YOU ARE DONE! The answers are below :)

In [None]:
def scrape_metro(line=4):
    # metro line to scrape
    line = line

    # api url for metro stops
    # url = 'https://api.metro.net/agencies/lametro/routes/'+str(line)+'/vehicles/'

    gdf = gpd.read_file('https://api.metro.net/LACMTA/vehicle_positions/route_code/'+str(line)+'?geojson=true')

    gdf = pd.concat([gdf, gdf["trip"].apply(pd.Series)], axis=1)
    gdf = pd.concat([gdf, gdf["vehicle"].apply(pd.Series)], axis=1)
    gdf = pd.concat([gdf, gdf["position"].apply(pd.Series)], axis=1)

    # drop columns
    gdf = gdf.drop(columns=['trip','vehicle','position','vehicle_label','trip_id','trip_start_date'])

    gdf['time'] = pd.to_datetime(gdf['timestamp'],unit='s')

    # convert utc time to Los Angeles time
    gdf['time'] = gdf['time'].dt.tz_localize('UTC').dt.tz_convert('US/Pacific')
    
    # Open your CSV file in append mode ('a')
    with open('bus.csv', 'a') as busfile: 
        
        # create a dictionary writer
        dictwriter = DictWriter(busfile, fieldnames=fields) 

        # loop through each row and add it to the csv file
        for index, row in gdf.iterrows():
            #Pass the dictionary as an argument to the writerow() 
            dictwriter.writerow(row.to_dict()) 
    
        #Close the file object 
        busfile.close()

In [None]:
# test the function
scrape_metro()

Check the file bus.csv!

## Automate it

The beauty of coding is that we can automate our code. The following code cell uses the `time` library, along with a `while` loop to run our function every 10 seconds, 10 times. Of course, these numbers can be modified as you wish.

First, recreate the `bus.csv` file:

In [None]:
# write/create a new csv file ('w') 
with open(filename, 'w') as csvfile:  
    # creating a csv writer object  
    csvwriter = csv.writer(csvfile)  
        
    # writing the fields  
    csvwriter.writerow(fields)  

Next, run the while loop:

In [None]:
import time

# set a counter
i = 1

# how many times to run
runtimes = 10

# run every x seconds
rest = 10

# start message
print('Starting metro scraper, once every '+str(rest)+' seconds for '+str(runtimes)+' iterations...')

# start the loop
while i <= runtimes:
    # timestamp to print out
    now = datetime.now(timezone('US/Pacific'))
    now = now.strftime('%Y-%m-%d %H:%M:%S')

    print(str(i) + ': ' + now)
    scrape_metro()
    i += 1
    
    # run this every x seconds
    time.sleep(rest)

If you see 10 lines above, you're done! Now, put the csv file into a dataframe:

In [None]:
# put data into dataframe
df = pd.read_csv('bus.csv')

In [None]:
# check the data
df.shape

In [None]:
df.info()

In [None]:
df

## Map it

### Plotly

In [None]:
# convert the id to a string so that plotly recognizes it as a categorical field
df['vehicle_id'] = df['vehicle_id'].astype(str)

In [None]:
fig = px.scatter_mapbox(df,
                        lat='latitude',
                        lon='longitude',
                        mapbox_style="carto-positron",
                        color= 'vehicle_id',
                        zoom=10,
                       )
fig.show()

### Kepler

To demonstration the animated features of kepler, let's import a bus file that used the functions above to scrape data for 30 minutes.

In [None]:
# import 30 minute bus locations for route 2
df = pd.read_csv('bus_30_min_sample.csv')

In [None]:
from keplergl import KeplerGl

In [None]:
# create the kepler map (feeding it a config file so that it visualizes it based on prior settings)
map = KeplerGl(height=600,width=800,config={
  "version": "v1",
  "config": {
    "visState": {
      "filters": [
        {
          "dataId": [
            "bus"
          ],
          "id": "yzpt6frz3",
          "name": [
            "time"
          ],
          "type": "timeRange",
          "value": [
            1612886276000,
            1612886287999.9995
          ],
          "enlarged": True,
          "plotType": "histogram",
          "yAxis": None
        }
      ],
      "layers": [
        {
          "id": "x9vz60j",
          "type": "point",
          "config": {
            "dataId": "bus",
            "label": "Point",
            "color": [
              18,
              147,
              154
            ],
            "columns": {
              "lat": "latitude",
              "lng": "longitude",
              "altitude": None
            },
            "isVisible": True,
            "visConfig": {
              "radius": 20.1,
              "fixedRadius": False,
              "opacity": 0.8,
              "outline": False,
              "thickness": 2,
              "strokeColor": None,
              "colorRange": {
                "name": "ColorBrewer Set1-6",
                "type": "qualitative",
                "category": "ColorBrewer",
                "colors": [
                  "#e41a1c",
                  "#377eb8",
                  "#4daf4a",
                  "#984ea3",
                  "#ff7f00",
                  "#ffff33"
                ]
              },
              "strokeColorRange": {
                "name": "Global Warming",
                "type": "sequential",
                "category": "Uber",
                "colors": [
                  "#5A1846",
                  "#900C3F",
                  "#C70039",
                  "#E3611C",
                  "#F1920E",
                  "#FFC300"
                ]
              },
              "radiusRange": [
                0,
                50
              ],
              "filled": True
            },
            "hidden": False,
            "textLabel": [
              {
                "field": None,
                "color": [
                  255,
                  255,
                  255
                ],
                "size": 18,
                "offset": [
                  0,
                  0
                ],
                "anchor": "start",
                "alignment": "center"
              }
            ]
          },
          "visualChannels": {
            "colorField": {
              "name": "vehicle_id",
              "type": "integer"
            },
            "colorScale": "quantile",
            "strokeColorField": None,
            "strokeColorScale": "quantile",
            "sizeField": None,
            "sizeScale": "linear"
          }
        }
      ],
      "interactionConfig": {
        "tooltip": {
          "fieldsToShow": {
            "bus": [
              {
                "name": "route_id",
                "format": None
              },
              {
                "name": "heading",
                "format": None
              },
              {
                "name": "id",
                "format": None
              },
              {
                "name": "predictable",
                "format": None
              },
              {
                "name": "seconds_since_report",
                "format": None
              }
            ]
          },
          "compareMode": False,
          "compareType": "absolute",
          "enabled": True
        },
        "brush": {
          "size": 0.5,
          "enabled": False
        },
        "geocoder": {
          "enabled": False
        },
        "coordinate": {
          "enabled": False
        }
      },
      "layerBlending": "normal",
      "splitMaps": [],
      "animationConfig": {
        "currentTime": None,
        "speed": 1
      }
    },
    "mapState": {
      "bearing": 17.204081632653057,
      "dragRotate": True,
      "latitude": 34.03641872791707,
      "longitude": -118.3557812310633,
      "pitch": 51.96682037600022,
      "zoom": 11.451941083083048,
      "isSplit": False
    },
    "mapStyle": {
      "styleType": "dark",
      "topLayerGroups": {},
      "visibleLayerGroups": {
        "label": True,
        "road": True,
        "border": False,
        "building": True,
        "water": True,
        "land": True,
        "3d building": False
      },
      "threeDBuildingColor": [
        9.665468314072013,
        17.18305478057247,
        31.1442867897876
      ],
      "mapStyles": {}
    }
  }
})

In [None]:
# show the map
map.add_data(data=df,name='bus')
map

Now it's your turn. Create map visualizations of other routes that Metro operates. 