# UCLA ITS Data Camp, Day 1
## Retrieving Data via API Calls

An increasingly common way to retrieve data is via an application programming interface (API), which refers to a framework of communication between you (client) and some server (computer) somewhere else in the world ([here](https://medium.com/@perrysetgo/what-exactly-is-an-api-69f36968a41f) for a high-level overview). You don't need to know how to program to make a basic API call; in fact, you are making API calls every day as you browse the internet and request webpages from the servers that host the web content. The most common way we make an API call is using the HTTP protocol (which you can see in any URL) and abiding by the [REST design principles](https://en.wikipedia.org/wiki/Representational_state_transfer). 
  
Beyond serving webpages, HTTP REST APIs have become an increasingly popular method for exchanging data across the internet, especially for cases where the data is granular and is being refreshed constantly. When you are consuming data via a REST API, there are four main types of actions you can perform:

GET: For reading/obtaining a resource from the server. This is the most common action and is the one we will be making for all of our calls since we will be _consuming_ data in this course.

POST: For creating a resource on the server

PUT: For updating a resource on the server

DELETE: For deleting a resource on the server

### Exercise 1: Accessing Open Data Portal APIs

##### Notebook Setup
As you did with the Pre-Course exercise, create a new project directory with the structure below and place this workbook within it.
```
day-1-prj/                     
├── data/                        
├── output/                      
└── Day_1.ipynb    
```
##### Setting up an API call
For today's lesson, we will be learning how to make REST API calls within Python, using the `requests` package, which you can read more about [here](https://2.python-requests.org/en/master/). If the package is not yet installed on your machine, make sure you do so. Data from an API is typically delivered as [XML or JSON](https://github.com/black-tea/ucla-its-data-camp-2019/blob/master/Pre-Course/Programming-Landscape.md#some-common-data-formats). 

In the last lesson we loaded data by reading in a CSV file. Today we are going to download the same set of data via an API call and ingesting the JSON-formatted response.

In [None]:
# Import packages
import requests

# Store the url in a variable
url = 'https://opendata.arcgis.com/datasets/bed43aa2945a47b18ae888246712ccb1_0.geojson'

# TODO: Make the request & store in a variable
resp = 

The response object includes a lot of different information. For example, we can get the status of the request (successful or unsuccessful) by calling `resp.status_code`. You will enounter several different [response codes](https://www.restapitutorial.com/httpstatuscodes.html) when working with APIs. 

In [None]:
# TODO: Print the status code


If the response was successful (`200`), we want to take a look at the output. For JSON-formatted data, we can access the body of the response by calling `resp.json()`. Let's take a look at the JSON output of our response.

In [None]:
# Examine the response content
resp.json()

##### Write out to disk
Especially if you are requesting dynamic data (data that is sensitive to the time at which you queried it), it is good practice to move the data out of memory (your variable) and into persistent disk storage (a file) before you do anything else with it. Although this process seems a bit redundant, it will make more sense in a bit...espeically when we are doing multiple API calls.
  
Let's write some code to save the JSON data from our API call into data/. We are going to be using the [json](https://docs.python.org/3/library/json.html) python library for reading & writing json data. _Hint: Make sure you setup the project directory structure correctly or you will get an error!_

In [None]:
import json

# Write out JSON to data/ or data/raw 
with open('data/raw/collisions_2009to2013.json', 'w') as outfile:
    json.dump(resp.json(), outfile)

##### Read back in the JSON data & parse to a dataframe
Now that we have the stored to disk, let's read it back into memory and wrangle it into a more 'tidy' format for further analysis.

In [None]:
# TODO: Read the file into memory, store JSON as `collisions` variable


If you remember from the lecture earlier with the sf data portal, you can construct a dataframe directly from JSON-formatted data.

In [None]:
# Import pandas package
import pandas as pd

# Convert the JSON to a dataframe format
collisions_df = pd.DataFrame(collisions['features'])

# Show the head of the dataframe
collisions_df.head()

You will see that the `geometry` column holds a JSON-formatted object with our lat/lon values and the `properties` column contains a JSON-formatted object with the rest of our data related to each collision. Following the example from the SF Data Portal earlier, create a separate column for `lat` and a separate column for `lon` that holds each value for each collision.

In [None]:
# Create empty lat/lon lists, 
latitudes = []
longitudes = []

# loop through DF geom objects, then join lists back to dataframe
for label, row in collisions_df.iterrows():
    if pd.notnull(row['geometry']):
        # TODO: append the first list item to `longitudes` and the second list item to `latitudes`

        
    else:
        latitudes.append(None)
        longitudes.append(None)
        
collisions_df['lat'] = latitudes
collisions_df['lon'] = longitudes

In [None]:
# Examine the new dataframe
collisions_df.head()

You'll still notice that all of our properties are within JSON objects as well. Pick any two column names within `properties` and write the loop again below to create two separate columns for those features.

In [None]:
# TODO: Loop through the DF, pick any two features
#       and separate them into two new columns, 
#       then examine the head of collisions_df


# View the head
collisions_df.head()

Nice work! We've actually taken the long way to access data from a GeoJSON object, but getting familiar with loops and dictionaries is absolutely critical for working with data in Python.

### Exercise 2: Getting data from LA Metro
Most of the time that you are querying APIs for data it will not be so straightforward to get it into a tidy format. Instead, what you will usually want to do is inspect the response content first before deciding how to proceed. Let's take a look at data from [LA Metro's Developer Portal](https://developer.metro.net/). Going to the [Metro Bus & Rail Real-time Arrivals](https://developer.metro.net/portfolio-item/real-time-arrivals/) page, we can see a variety of APIs that are publicly available. Take a look at all the [feeds](https://developer.metro.net/introduction/realtime-api-overview/realtime-api-returning-json/) returning JSON-formatted content, including route information, stop information, and realtime vehicle location information.

You will notice that instead of GeoJSON, it is in a slightly different format that requires just a bit of wrangling to get it in the right format.

##### Create an API Call
Pick any of the Metro routes and, following the structure in the example, make a call to get all the current vehicles on that route. Once we get the response (assuming it is successful), let's take a look at the content.

In [None]:
# TODO: Write the statement to call the Metro API and get all vehicles for a particular route
#       and store the response as resp
# (No need to import the requests package again)
resp = 

# TODO: Store the JSON content as `data`
data = 

You will notice that instead of GeoJSON, it is in a slightly different format. We can convert a list of key, value pairs into a Pandas dataframe easily by `df = pd.DataFrame(dict)`. Let's go ahead and convert the json output into a dict. _Hint: Make sure you access the list part of the JSON output!_

In [None]:
# TODO: Convert the JSON output to a dataframe
metro_df = 

# Examine the head of the dataframe
metro_df.head()

##### Add a Column to the DataFrame
One thing you will notice is that when we made the dataframe above, we are missing the timestamp of the query. If we plan to write out the data for analysis later, we need to add the time of the query as a column value. The easiest way to get the current time in Python is through the [datetime](https://docs.python.org/2/library/datetime.html) package. Take a little bit of time to look through the documentation with a particular focus on the `now()` method.

Once we get the value of the current time, we can add it as a new column value to our current dataframe. Create an additonal column `call_time`. In the function, get the current timestamp of the call and add it as the value for that column.

In [None]:
# Import the datetime module
from datetime import datetime

# TODO: Get the current time
now = 

# TODO: Add the current time as a value to the dataframe column `call_time`
metro_df['call_time'] = 

##### Wrap the API Call in a Function
Let's create a function to take a Route ID and make the API call for all realtime vehicle locations on that route. Add in the code we used in the block above to also create a column with the time we called the API.

_Function Input:_ Route ID  
_Function Output:_ Response Dataframe with the content response 

In [None]:
# TODO: Create the function
def get_vehicles_byroute(routenum):


Let's take a look to make sure our function is working correctly. Run the cell below to confirm that you are getting the desired result. Go ahead and try changing the input and see how the output changes.

In [None]:
# Call the function for one of the routes
routedata = get_vehicles_byroute(720)

# Examine the head of the dataframe
routedata.head()

##### Add Functionality
Great! Now we are able to change the route number and get a dataframe with the current location of all vehicles on the route. One of the next things we might want to do would be to get data from the route throughout the day and store it for later analysis. To do that we are going to need to add the following functionality into our function:

1. Write out the csv to the a file in our `data/processed` folder. Let's set the filename to the format `lametro_[routenum]_[timestamp].csv` (Eg. `lametro_720_2019-09-10-22-26-52.csv`). To write out the file, go ahead and use [Panda's method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html) for writing out a csv file.
2. Add conditional logic to only write out the file if the call was a success. If the call was not successful, print out the error message. Take a look [here](https://2.python-requests.org/en/master/user/quickstart/#response-status-codes) for some guidance.

As the function gets a bit more complex, please add appropriate code comments inside to quickly convey the purpose of each code block.


In [None]:
# TODO: Re-write the function with the requested features
def get_vehicles_byroute(routenum):


Let's go ahead and call the function again for one of the routes to ensure that it was written out correctly.

In [None]:
get_vehicles_byroute(704)

Check your data folder - if everything was successful, you should see a CSV file with the data from the call.
##### Introduction to variable-length arguments 
We've now built a function that, for a given route, will get current vehicle location data, format it into a dataframe, and write it out to a CSV file with the current datetime. What if we were interested in 2 routes? or 3 routes? Let's build another function that takes as input a _variable number of route numbers_ and then gets the vehicle data for each of them.
  
We will do this through the [_*args_ syntax](https://www.geeksforgeeks.org/args-kwargs-python/). Following that syntax, create a function called `get_vehicles_byroutes` that takes in a variable number of route numbers. For each route number, the function should call our other function `get_vehicles_byroute`. Between each call to our original function, add a 5 second pause to reduce the load on the server.

In [None]:
import time

# TODO: Finish composing the function
def get_vehicles_byroutes(*routes):


##### Create a Loop to run the Function 
Great! We now have a function that calls the Metro's API, records the location of all vehicles for a particular route(s), logs the current timestamp, and saves the file in a location of our choosing. Let's (1) pick a few routes we want to get data from and (2) create a loop that runs the `get_vehicles_byroutes` function 1x per minute, for 5 minutes with those route numbers as the input.


In [None]:
# TODO: Execute the function 5x, each time separated by a minute
