# Assignment 1-2: Data Collection Using Web APIs

## Objective

Many Websites (such as Twitter, Yelp, Spotify) provide free APIs to allow users to access their data. *API wrappers* simplify the use of these APIs by wrapping API calls into easy-to-use Python functions. At SFU, we are developing a unified API wrapper, called [DataPrep.Connector](https://github.com/sfu-db/dataprep#connector), which offers a unified programming interface to collect data from a variety of Web APIs.

In this assignment, you will learn the followings:

* How to ask insightful questions about data
* How to collect data from Web APIs using DataPrep.Connector

**Requirements:**

1. Please use [pandas.DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) rather than spark.DataFrame to manipulate data.

2. Please follow the python code style (https://www.python.org/dev/peps/pep-0008/). If TA finds your code hard to read, you will lose points. This requirement will stay for the whole semester.

## Preliminary

DataPrep.Connector is very easy to learn. By watching this 10-min [PyData Global 2020 talk](https://www.youtube.com/watch?v=56qu-0Ka-dA), you should be able to know how to use it. 

If you want to know more, below are some other useful resources.

* [Quick Introduction](https://github.com/sfu-db/dataprep#connector)
* [User Guide](https://sfu-db.github.io/dataprep/user_guide/connector/connector.html) 
* [Fetch and analyze COVID-19 tweets using DataPrep](https://www.youtube.com/watch?v=vvypQB3Vp1o)

## Overview

This is a **group** assignment. Please check which API your group will work on from this [page](https://coursys.sfu.ca/2021sp-cmpt-733-g1/pages/Web-API-Assignment/view).  Your group needs to first come up with $n$ questions about a Web API and then write code to answer them one by one, where $n = 6$ if your group has three members and $n = 4$ if your group has two members.

For example, Sakina, Vignesh, and Wen Han will form one group and they need to come up with 6 questions about the Etsy API and write code to answer them. Chuan-Yun, Krince will form one group and they need to come up with 4 questions about the Guardian API and write code to answer them.

## How to do this assignment?

The first thing is to come up with a list of **good questions**. Please use the following to judge whether a list of questions are good or not.

1. Good questions need to be useful. That is, they are common questions asked about the API.
2. Good questions need to be diverse. That is, they cover different aspects of the API. 
3. Good questions have to cover different difficulty levels. That is, it consists of both easy and difficult questions,  where the difficulty can be measured by the number of lines of code or the number of input parameters.

Suppose that your group is assigned to work on the Yelp API. The following shows a list of 4 good questions. The corresponding code for these questions can be found at this [link](https://github.com/sfu-db/DataConnectorConfigs#yelp----collect-local-business-data).

* Q1. What's the phone number of Capilano Suspension Bridge Park?
* Q2. Which yoga store has the highest review count in Vancouver?
* Q3. How many Starbucks stores in Seattle and where are they?
* Q4. What are the ratings for a list of restaurants?

**Why are they useful?**
* Q1 is useful because "Capilano Suspension Bridge Park" is one of the most popular tourist attractions in Vancouver.
* Q2 is useful because a yoga fan want to find out the most popular yoga store in Vancouver. 
* Q3 is useful because Starbucks was founded in Seattle.
* Q4 is useful because people often rely on yelp ratings to decide which restaurant to go to.

**Why are they diverse?**

This is because the [code](yelp-code.png) written to answer them has different inputs or outputs.
* Q1 takes `term` and `location` as input and returns 1 record with attributes `name` and `phone` 
* Q2 takes `categories`, `location`, and `sort_by` as input and returns 1 record with attributes `name` and `review_count`
* Q3 takes `term` and `location` as input and returns n records with attributes `name`, `address`, `city`, `state`, `country`, `zipcode`
* Q4 takes a list of retarurant `names` as input and return n records with attributes `name`, `rating`, `city`

**Why are they more and more difficult?**
* Q1 (4 lines of code, 2 query parameters)
* Q2 (4 lines of code, 3 query parameters)
* Q3 (5 lines of code, 2 query parameters)
* Q4 (11 lines of code, 2 query parameters)

Please note that you have to use DataPrep.Connector to get data from the Web API. If DataPrep.Connector cannot meet your needs, please post your questions on Canvas. We will help you. 

## Now, it's your turn. :) 

Please write down your questions and the corresponding code for your assigned API. 

In [1]:
from dataprep.connector import connect
import asyncio
import pandas as pd
import numpy as np
from numpy import radians, sin, cos, arctan2, sqrt
    
# Provide your API key here for TAs to reproduce your results
KEY = 'YDo3Kc9VANlqx1muCD0FlWAAh4gfRC9G'

SFU_LOC = '-122.90416, 49.27647'
BC_BBOX = "-139.06,48.30,-114.03,60.00"
VAN_BBOX = '-123.27,49.195,-123.020,49.315'
METRO_TOWN = [-122.9987, 49.2250]
SHOPPING = 'sic:565101,sic:651201,sic:566101,sic:531102'
GROCERY = 'sic:541105'


conn = connect("./mapquest", _auth={"access_token": KEY}, _concurrency = 10)

### Q1: Where is the Simon Fraser University? Give all the places if there is more than one campus. 

In [2]:
campus = await conn.query("place", q = "Simon Fraser University", sort = "relevance", bbox = BC_BBOX, _count = 50)
campus = campus[campus["name"] == "Simon Fraser University"].reset_index()
campus.head()

Unnamed: 0,index,name,country,state,city,address,postalCode,coordinates,details
0,0,Simon Fraser University,CA,BC,Burnaby,8888 University Drive E,V5A 1S6,"[-122.90416, 49.27647]","Simon Fraser University, 8888 University Drive..."
1,2,Simon Fraser University,CA,BC,Vancouver,602 Hastings St W,V6B 1P2,"[-123.113431, 49.284626]","Simon Fraser University, 602 Hastings St W, Va..."


From the list above, we see that according to MapQuest, there are two campuses of Simon Fraser University, one located at 8888 University Drive E, Burnaby, and the other one located at 602 Hastings St W, Vancouver.

### Q2: How many KFC are there in Burnaby? What are their address?

In [3]:
kfc = await conn.query("place", q = "KFC", sort = "relevance", bbox = BC_BBOX, _count = 500)
kfc = kfc[(kfc["name"] == "KFC") & (kfc["city"] == "Burnaby")].reset_index()
print("There are %d KFCs in Burnaby" % len(kfc))
print("Their addresses are:")
kfc

There are 1 KFCs in Burnaby
Their addresses are:


Unnamed: 0,index,name,country,state,city,address,postalCode,coordinates,details
0,81,KFC,CA,BC,Burnaby,5094 Kingsway,V5H 2E7,"[-122.990545, 49.225227]","KFC, 5094 Kingsway, Burnaby, BC V5H 2E7"


### Q3 The ratio of Starbucks to Tim Hortons in Vancouver?

In [4]:
starbucks = await conn.query('place', q='starbucks', sort='relevance', bbox=VAN_BBOX, page='1', pageSize = '50', _count=200)
timmys = await conn.query('place', q='Tim Hortons', sort='relevance', bbox=VAN_BBOX, page='1', pageSize = '50', _count=200)

is_vancouver_sb = starbucks['city'] == 'Vancouver'
is_vancouver_tim = timmys['city'] == 'Vancouver'
sb_in_van = starbucks[is_vancouver_sb]
tim_in_van = timmys[is_vancouver_tim]
print('The ratio of Starbucks:Tim Hortons in Vancouver is %d:%d' % (len(sb_in_van), len(tim_in_van)))



The ratio of Starbucks:Tim Hortons in Vancouver is 188:120


### Q4 What is the closest gas station from Metropolist and how far is it?

In [5]:
def distance_in_km(cord1, cord2):
    R = 6373.0

    lat1 = radians(cord1[1])
    lon1 = radians(cord1[0])
    lat2 = radians(cord2[1])
    lon2 = radians(cord2[0])

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * arctan2(sqrt(a), sqrt(1 - a))
    distance = R * c

    return(distance)


METRO_TOWN_string = '%f,%f' % (METRO_TOWN[0], METRO_TOWN[1])
nearest_petro = await conn.query('place', q='gas station', sort='distance', location=METRO_TOWN_string, page='1', pageSize = '1')
print('Metropolist is %fkm from the nearest gas station' % distance_in_km(METRO_TOWN, nearest_petro['coordinates'][0]))
print('The gas station is %s at %s' % (nearest_petro['name'][0], nearest_petro['address'][0]))

Metropolist is 0.376580km from the nearest gas station
The gas station is Chevron at 4692 Imperial St


### Q5 Assume the place with the more shopping malls is more prosperous. In BC, which city has the most amount of shopping centers?

In [6]:
shop_list = await conn.query("place", sort="relevance", bbox=BC_BBOX, category=GROCERY,_count = 500)
shop_list = shop_list[shop_list["state"] == "BC"]
shop_list.groupby('city')['name'].count().sort_values(ascending=False).head(10)

city
Vancouver          42
Victoria           24
Surrey             15
Burnaby            14
Richmond           13
Langley            10
Kelowna            10
Nanaimo             9
Abbotsford          8
North Vancouver     8
Name: name, dtype: int64

Vancouver has 42 shopping centers,which is the most amount.

### Q6 Where is the nearest grocery of SFU? How many miles far? And how much time estimated for driving?

In [7]:
nearest_grocery = await conn.query("place", location=SFU_LOC, sort="distance", category=GROCERY)
destination = nearest_grocery.iloc[0]['details']
name = nearest_grocery.iloc[0]['name']

route = await conn.query("route", from_='8888 University Drive E, Burnaby', to=destination)
total_distance = sum([float(i)for i in route.iloc[:]['distance']])
total_time = sum([int(i)for i in route.iloc[:]['time']])
narrative = ''
for i in range(len(route) - 1):
    narrative = narrative + route.iloc[i]['narrative'] + ' for ' + route.iloc[i]['distance'] + ' miles.\n'
narrative += route.iloc[-1]['narrative']

print('The nearest grocery of SFU is ' + name + '. It is ' + str(total_distance) + ' miles far, and It is expected to take ' + str(total_time // 60) + 'm' + str(total_time % 60)+'s of driving.')
print('Route:\n' + narrative)
route

The nearest grocery of SFU is Nesters Market. It is 1.234 miles far, and It is expected to take 3m21s of driving.
Route:
Start out going east on University Dr toward Arts Rd. for 0.348 miles.
Turn left to stay on University Dr. for 0.606 miles.
Enter next roundabout and take the 1st exit onto University High St. for 0.28 miles.
9000 UNIVERSITY HIGH STREET is on the left.


Unnamed: 0,index,narrative,distance,time
0,0,Start out going east on University Dr toward A...,0.348,57
1,1,Turn left to stay on University Dr.,0.606,84
2,2,Enter next roundabout and take the 1st exit on...,0.28,60
3,3,9000 UNIVERSITY HIGH STREET is on the left.,0.0,0


## Submission

Complete this notebook, rename it to `A1-2-[WEB API Name].ipynb`, and submit it to the CourSys activity `Assignment 1`. For example, if your group works on Twitter, then **every member of your group** needs to submit the same notebook named `A1-2-Twitter.ipynb`.