<a href="https://colab.research.google.com/github/christophermalone/DSCI325/blob/main/Module5_Part1_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module 5 - Part 1 Python: Application Programming Interface (API) - Yelp

## What is an API

An <strong>Application Programming Interface (API)</strong> is common method to obtain and share data across various applications. 


<p align='center'><img src="https://drive.google.com/uc?export=view&id=1Mvm4Rs94-fiGV1fISMvrZSzkOPBg5S1p"></p>

Source:  https://en.wikipedia.org/wiki/API

An API allows for communication to easily transpire between a user and database (or server).

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1v19F5Cxe8gDat9IfaGNbFuRjc9colU18"></p>

<table width='100%' ><tr><td bgcolor='green'></td></tr></table>

## Example 5.1.P
For this notebook, we will use the Yelp API to obtain data regarding the *Best Restaurants* in Winona, MN. 
 
The following search criteria will be used

*   Locaton: Winona, MN
*   Search Term: Best Restaurants
*   Price: $, 1 dollar sign implies cheapest 
*   Sort by: Highest Rated

Source:  https://www.yelp.com/search?find_desc=Best%20Restaurants&find_loc=Winona%2C%20MN%2055987&attrs=RestaurantsPriceRange2.1&sortby=rating 

<table width='100%' ><tr><td bgcolor='green'></td></tr></table>

## Search via Yelp

Consider the following search done by Yelp.  The specifications of this search include: Search Term = "Best Restaurants", Location = "Winona, MN", 1 Filter applied for "$" (i.e. cheapest), and Sort outcomes by "Highest Rated".


<p align='center'><img src="https://drive.google.com/uc?export=view&id=1PDxr6TCzzxZy0XA9Un8phz5tDnOSNDeI" width='75%' height='75%'></p>

The outcomes returned by Yelp are provided here.  Notice that the outcomes are *not* necessarily sorted by Highest Rating.  Beno's Deli is actually the highest rated is this list of three, but is not listed on top.  I would imagine that Yelp takes into consideration the number of reviews when determining "best".  For example, a restraurant with an overall rating of 5 based on two reviews should probably be rated lower than a restraurant with an overall rating of 4.5 based on several hundred reviews.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1-T6QQlQKJaylbfdcLdLaMaM94Ripo7Pv" width='50%' height='50%'></p>

<strong>Goal</strong>:  To obtain a list that is truely sorted by Rating.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=12kJ8-apk2FSZyO3mMAg7hqxj5j0jfXLB"></p>

## Setting up an Account at Yelp

Most often, the first step in using an API is to create an account with the organization that owns the API.  For this Notebook we will be using the Yelp APO; thus, an developer account will need to be created at Yelp. 

Yelp Developer Site: https://www.yelp.com/developers

The next step is to fill out the required form for your new "app".  In this class, we will not be creating an actual app, but this form is required in order to obtain a:

*    <strong>Client ID</strong>: Unique identifier for yourself 
*    <strong> API Key</strong>: Unique identifier for your app

These two identifiers are somewhat common when working with APIs.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1wbn460BbsOUOF36sl1hYcVPLXiERlr-s" width='50%' height='50%'></p>

## Setting up Python

First, install pandas and np packages for working with data in Python.

In [1]:
import pandas as pd
import numpy as np

The <strong>YelpAPI</strong> package in Python requires the specificaiton of your Client ID and API Key.  Obtain the Client ID and API Key from the Yelp Developer site and copy and paste these strings here.

In [5]:
#Setting up API Connection Information
client_id = ""
api_key = ""

Next, download the YelpAPI package.

Source:  https://github.com/gfairchild/yelpapi

In [6]:
pip install yelpapi



Next, load the YelpAPI package into this Colab session.

In [8]:
from yelpapi import YelpAPI

Next, the <strong>YelpAPI()</strong> function established a connection with the Yelp Fusion API.  Additional details regarding this fuction can be found on the github site referenced above. 

In [9]:
yelp_api = YelpAPI(api_key)

## Making an API Call

A call to the API requires specification of a set of parameters.  Some of these parameters are required, e.g. location, and others are options, e.g. price.  The list of possible parameters can be found on the Yelp's documuntation page.

Source: https://www.yelp.com/developers/documentation/v3/business_search



<p align='center'><img src="https://drive.google.com/uc?export=view&id=16r6KMKNbL2TxNXK5UXOZqFjNZJPl3qNK" width='25%' height='25%'></p>

The following specifications will be used for our first API call.


*   <strong>Term</strong>: Best Restaurants"
*   <strong>Location</strong>: Winona, MN
*   <strong>Search Limit</strong>: Use 50 -- the maximum possible.

<strong>Comment</strong>:  If more than 50 outcomes are desired, the <strong>offset</strong> parameter can be used to obtain the *next* 50.



In [10]:
term = 'Best Restaurants'
location = 'Winona, MN'
search_limit = 50

response = yelp_api.search_query(term = term,
                                 location = location,
                                 limit = search_limit)

Let's take a look at what is returned by the Yelp API.

In [None]:
response

## The API Outcomes

Often, the <strong>JSON</strong> data format is used by APIs. A JSON data struture is much more flexible than a dataframe, e.g. JSON allows for *nested* data structures.  Most often a dataframe requires data to be in a tabluar format with clearly defined rows and columns.  A quick review of the information returned by the YelpAPI suggests that a dataframe is not the best way to store such informaiton. 





<p align='center'><img src="https://drive.google.com/uc?export=view&id=14L0_LhB2T3dARxTCY88ZUB6c5GcM23Cw" width='75%' height='75%'></p>

The YelpAPI has *automatically* converted the JSON data structure into a Python dictionary - a commonly used data struture within Python that allows for more struture than a dataframe.

In [12]:
type(response)

dict

The following will convert some of the contents of the <strong>response</strong> dictionary into a pandas dataframe.  

In [16]:
cols = list(response['businesses'][0].keys())
WinonaYelp = pd.DataFrame(columns=cols)

for biz in response['businesses']:
    WinonaYelp = WinonaYelp.append(biz, ignore_index=True)


Taking a look at this newly created dataframe.

In [20]:
WinonaYelp.head(2)


Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,b7AuH8sAf0IDs6ZR8sPAtg,the-boat-house-winona,The Boat House,https://s3-media3.fl.yelpcdn.com/bphoto/JCi5Is...,False,https://www.yelp.com/biz/the-boat-house-winona...,87,"[{'alias': 'newamerican', 'title': 'American (...",3.5,"{'latitude': 44.0550145, 'longitude': -91.6381...",[],$$,"{'address1': '2 Johnson St', 'address2': '', '...",15074746550,(507) 474-6550,1786.461266
1,G2ptcgOx5e9usmxW38OoOw,hillside-fish-house-marshland-2,Hillside Fish House,https://s3-media3.fl.yelpcdn.com/bphoto/IBZ-vV...,False,https://www.yelp.com/biz/hillside-fish-house-m...,34,"[{'alias': 'wine_bars', 'title': 'Wine Bars'},...",4.0,"{'latitude': 44.07125, 'longitude': -91.55721}",[],$$,"{'address1': 'W124 State Rd 35 54', 'address2'...",16086876141,(608) 687-6141,8487.76768


<strong>Note</strong>:  Not all of the contents will be converted into seperate fields. For example, the contents of the categories, coordinates, and location fields have not yet be exploded into seperate columns.

## Using dfply Package to Manipulate the Dataframe

In [21]:
pip install dfply

Collecting dfply
[?25l  Downloading https://files.pythonhosted.org/packages/53/91/18ab48c64661252dadff685f8ddbc6f456302923918f488714ee2345d49b/dfply-0.3.3-py3-none-any.whl (612kB)
[K     |▌                               | 10kB 19.6MB/s eta 0:00:01[K     |█                               | 20kB 27.2MB/s eta 0:00:01[K     |█▋                              | 30kB 20.7MB/s eta 0:00:01[K     |██▏                             | 40kB 16.8MB/s eta 0:00:01[K     |██▊                             | 51kB 15.3MB/s eta 0:00:01[K     |███▏                            | 61kB 15.7MB/s eta 0:00:01[K     |███▊                            | 71kB 13.5MB/s eta 0:00:01[K     |████▎                           | 81kB 14.6MB/s eta 0:00:01[K     |████▉                           | 92kB 14.4MB/s eta 0:00:01[K     |█████▍                          | 102kB 15.6MB/s eta 0:00:01[K     |█████▉                          | 112kB 15.6MB/s eta 0:00:01[K     |██████▍                         | 122kB 15.6MB/s

In [22]:
from dfply import *

The following actions will take place on this dataframe.


*   filter: filter on price == $, i.e. only keep cheapest restaurants 
*   arrange: sort the list by rating, and also by the number of reviews
*   filter: only keep restaurants that have 10 or more reviews



In [28]:
WinonaList = (
                 WinonaYelp
                 >> filter_by(X.price == '$')
                 >> arrange(desc(X.rating), desc(X.review_count))
                 >> filter_by(X.review_count >= 10) 
             )

#Checking the dataframe
WinonaList.head(2)

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
10,wRT7U5srhCDek5N-GDROKw,garden-of-eatin-galesville,Garden of Eatin',https://s3-media1.fl.yelpcdn.com/bphoto/sHQ8rm...,False,https://www.yelp.com/biz/garden-of-eatin-gales...,24,"[{'alias': 'diners', 'title': 'Diners'}, {'ali...",5.0,"{'latitude': 44.0823135375977, 'longitude': -9...",[],$,"{'address1': '19847 E Gale Ave', 'address2': '...",16085824366,(608) 582-4366,25142.137259
4,0OAoruhVskDUJiGFzksE-w,river-cafe-trempealeau,River Cafe,https://s3-media4.fl.yelpcdn.com/bphoto/8m2yDp...,False,https://www.yelp.com/biz/river-cafe-trempealea...,13,"[{'alias': 'tradamerican', 'title': 'American ...",5.0,"{'latitude': 44.003788, 'longitude': -91.4309464}",[],$,"{'address1': '23991 3rd St', 'address2': None,...",16085345055,(608) 534-5055,19142.237095


Using print() function to more easily see struture of the categories, coordinates, and location fields.

In [29]:
print(WinonaList.to_string(index=False))

                     id                                   alias                      name                                                             image_url is_closed                                                                                                                                                                                                  url review_count                                                                                                                                                                  categories  rating                                                     coordinates transactions price                                                                                                                                                                                                                    location         phone   display_phone      distance
 wRT7U5srhCDek5N-GDROKw              garden-of-eatin-galesville          Garden of Eatin'  ht

The existing dataframe needs to be modified to include only locations in <strong>Winona, MN</strong>.  Furthermore, the display_address field will be used to create an connection with the Google Map API.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1E7Gxp0nb2l4mt5iYAREAf1ozWJXHSU_m" width='75%' height='75%'></p>

The following json_normalize() function is used to seperate the location field into seperate columns.

In [30]:
WinonaList_NormalizeLocation = pd.json_normalize(WinonaList['location'])

<strong>Note</strong>:  The following snipit of code to accomplish that same -- pd.DataFrame(WinonaList['location'].tolist())

In [31]:
WinonaList_NormalizeLocation

Unnamed: 0,address1,address2,address3,city,zip_code,country,state,display_address
0,19847 E Gale Ave,,,Galesville,54630,US,WI,"[19847 E Gale Ave, Galesville, WI 54630]"
1,23991 3rd St,,,Trempealeau,54661,US,WI,"[23991 3rd St, Trempealeau, WI 54661]"
2,214 N Main St,,,Cochrane,54622,US,WI,"[214 N Main St, Cochrane, WI 54622]"
3,115 4th St S,,,La Crosse,54601,US,WI,"[115 4th St S, La Crosse, WI 54601]"
4,1311 La Crescent Pl,,,French Island,54603,US,WI,"[1311 La Crescent Pl, French Island, WI 54603]"
5,78 E 4th St,,,Winona,55987,US,MN,"[78 E 4th St, Winona, MN 55987]"
6,110 E Main St,,,Utica,55979,US,MN,"[110 E Main St, Utica, MN 55979]"
7,77 Lafayette,,,Winona,55987,US,MN,"[77 Lafayette, Winona, MN 55987]"
8,610 E Sarnia St,,,Winona,55987,US,MN,"[610 E Sarnia St, Winona, MN 55987]"
9,600 N Main St,,,Alma,54610,US,WI,"[600 N Main St, Alma, WI 54610]"


Next, the contents of the WinonaList_NormalizeLocation dataframe will need to be joined to WinonaList dataframe from above (that contains all the other columns).

The inherent <strong>index</strong> will be as the *key* for this simple join.  The index for the WinonaList_NormalizeLocation dataframe goes from 0 to 17.  The index in the WinonaList dataframe will reset to go from 0 to 17.  The current index in the WinonaList is from before the filters being applied above.

In [32]:
WinonaList = WinonaList.reset_index(drop=True)

Check to make sure reindexing worked for the WinonaList dataframe.

In [34]:
WinonaList.head(2)

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,wRT7U5srhCDek5N-GDROKw,garden-of-eatin-galesville,Garden of Eatin',https://s3-media1.fl.yelpcdn.com/bphoto/sHQ8rm...,False,https://www.yelp.com/biz/garden-of-eatin-gales...,24,"[{'alias': 'diners', 'title': 'Diners'}, {'ali...",5.0,"{'latitude': 44.0823135375977, 'longitude': -9...",[],$,"{'address1': '19847 E Gale Ave', 'address2': '...",16085824366,(608) 582-4366,25142.137259
1,0OAoruhVskDUJiGFzksE-w,river-cafe-trempealeau,River Cafe,https://s3-media4.fl.yelpcdn.com/bphoto/8m2yDp...,False,https://www.yelp.com/biz/river-cafe-trempealea...,13,"[{'alias': 'tradamerican', 'title': 'American ...",5.0,"{'latitude': 44.003788, 'longitude': -91.4309464}",[],$,"{'address1': '23991 3rd St', 'address2': None,...",16085345055,(608) 534-5055,19142.237095


Complete the simple join.

In [38]:
WinonaList_WithLocation = WinonaList.join(WinonaList_NormalizeLocation)

Taking a look at the dataframe after the JOIN.

In [39]:
WinonaList_WithLocation.head(2)

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance,address1,address2,address3,city,zip_code,country,state,display_address
0,wRT7U5srhCDek5N-GDROKw,garden-of-eatin-galesville,Garden of Eatin',https://s3-media1.fl.yelpcdn.com/bphoto/sHQ8rm...,False,https://www.yelp.com/biz/garden-of-eatin-gales...,24,"[{'alias': 'diners', 'title': 'Diners'}, {'ali...",5.0,"{'latitude': 44.0823135375977, 'longitude': -9...",[],$,"{'address1': '19847 E Gale Ave', 'address2': '...",16085824366,(608) 582-4366,25142.137259,19847 E Gale Ave,,,Galesville,54630,US,WI,"[19847 E Gale Ave, Galesville, WI 54630]"
1,0OAoruhVskDUJiGFzksE-w,river-cafe-trempealeau,River Cafe,https://s3-media4.fl.yelpcdn.com/bphoto/8m2yDp...,False,https://www.yelp.com/biz/river-cafe-trempealea...,13,"[{'alias': 'tradamerican', 'title': 'American ...",5.0,"{'latitude': 44.003788, 'longitude': -91.4309464}",[],$,"{'address1': '23991 3rd St', 'address2': None,...",16085345055,(608) 534-5055,19142.237095,23991 3rd St,,,Trempealeau,54661,US,WI,"[23991 3rd St, Trempealeau, WI 54661]"


Next, <strong>keep</strong> only restaurant locations in Winona.

In [42]:
WinonaList_WithLocation_OnlyWinona = (
                          WinonaList_WithLocation
                          >> filter_by(X.city == "Winona")
                          >> select(X.name, X.image_url, X.review_count, X.rating, X.display_address)
                       )

WinonaList_WithLocation_OnlyWinona = WinonaList_WithLocation_OnlyWinona.reset_index(drop=True)

More modifications will be done to the <strong>image_url</strong> field and the <strong>display_adress</strong> field that will be used to connect to the Google Maps API.

*  \<img src=https://s3-media1.fl.yelpcdn.com/bphoto/snAzXC5xLgybMzV8NxGqyw/o.jpg width="100px" height="100px"\>

*   \<a href="https://www.google.com/maps/place/78+E+4th+St+Winona,+MN+55987" target="_blank">Map</a\>

*   A new field, named rank, will be used to rank the restrauants from 1 on up.

In [44]:
WinonaList_WithLocation_OnlyWinona_PicMap = (
                          WinonaList_WithLocation_OnlyWinona
                          >> mutate(pic = 
                                          '<img src="' 
                                           + X.image_url 
                                           + '" width="100px" height="100px">'
                                    )
                          >> mutate(mapurl =
                                         '<a href="https://www.google.com/maps/place/'
                                          + X.display_address.str.join(sep=' ').str.replace(' ','+')
                                          + '" target="_blank">Map</a>'
                                    )
                          >> select(X.name, X.review_count, X.rating, X.pic, X.mapurl)

                       )
#Create a vector from 1 to n -- used for ranks
WinonaList_WithLocation_OnlyWinona_PicMap['rank'] = 1 + np.arange(len(WinonaList_WithLocation_OnlyWinona_PicMap))

#Move the selected column to left-most column
WinonaList_WithLocation_OnlyWinona_PicMap.insert(0,'rank',WinonaList_WithLocation_OnlyWinona_PicMap.pop('rank'))

print(WinonaList_WithLocation_OnlyWinona_PicMap.to_string(index=False))


 rank                      name review_count  rating                                                                                                            pic                                                                                                     mapurl
    1               Beno's Deli           20     4.5  <img src="https://s3-media1.fl.yelpcdn.com/bphoto/snAzXC5xLgybMzV8NxGqyw/o.jpg" width="100px" height="100px">           <a href="https://www.google.com/maps/place/78+E+4th+St+Winona,+MN+55987" target="_blank">Map</a>
    2         The Acoustic Cafe           73     4.0  <img src="https://s3-media1.fl.yelpcdn.com/bphoto/0yG-DARmJJGYlUK8pgBnHw/o.jpg" width="100px" height="100px">          <a href="https://www.google.com/maps/place/77+Lafayette+Winona,+MN+55987" target="_blank">Map</a>
    3        Lakeview Drive Inn           40     4.0  <img src="https://s3-media2.fl.yelpcdn.com/bphoto/PrdhUUSXIQnrHHhH-uxUHQ/o.jpg" width="100px" height="100px">       <a href="https://

Next, this dataframe will be converted into an HTML table.

In [46]:
Winona_HTMLFile = WinonaList_WithLocation_OnlyWinona_PicMap.to_html(index=False)

Looking the output returned from conversion.

In [47]:
Winona_HTMLFile

'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th>rank</th>\n      <th>name</th>\n      <th>review_count</th>\n      <th>rating</th>\n      <th>pic</th>\n      <th>mapurl</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <td>1</td>\n      <td>Beno\'s Deli</td>\n      <td>20</td>\n      <td>4.5</td>\n      <td>&lt;img src="https://s3-media1.fl.yelpcdn.com/bphoto/snAzXC5xLgybMzV8NxGqyw/o.jpg" width="100px" height="100px"&gt;</td>\n      <td>&lt;a href="https://www.google.com/maps/place/78+E+4th+St+Winona,+MN+55987" target="_blank"&gt;Map&lt;/a&gt;</td>\n    </tr>\n    <tr>\n      <td>2</td>\n      <td>The Acoustic Cafe</td>\n      <td>73</td>\n      <td>4.0</td>\n      <td>&lt;img src="https://s3-media1.fl.yelpcdn.com/bphoto/0yG-DARmJJGYlUK8pgBnHw/o.jpg" width="100px" height="100px"&gt;</td>\n      <td>&lt;a href="https://www.google.com/maps/place/77+Lafayette+Winona,+MN+55987" target="_blank"&gt;Map&lt;/a&gt;</td>\n    </tr>\n    <tr>

Instead of pushing the HTML converted table to the screen, let's push this contents into a HTML file.

In [48]:
text_file = open("sample_data/index.html", "w")
text_file.write(WinonaList_WithLocation_OnlyWinona_PicMap.to_html(index=False))
text_file.close()

Adding <strong>\<html\><strong> to beginning of file.

In [None]:
!sed -i '1i \<html>\' /content/sample_data/index.html

Next, adding <strong>\<body\></strong> on the 2nd line.

In [None]:
!sed -i '2i \<body>\' /content/sample_data/index.html

Next, need to put \<\body\> and \<\html\> tags at end of HTML file.

In [49]:
!sed -i '$ a </body>' /content/sample_data/index.html
!sed -i '$ a </html>' /content/sample_data/index.html

Clean up some issues with the HTML conversion...

In [50]:
!sed -i 's/&lt;/</g' /content/sample_data/index.html

Similarly for end tags...

In [51]:
!sed -i 's/&gt;/>/g' /content/sample_data/index.html

Next, center align the contents of the header row.

In [52]:
!sed -i 's/text-align: right;/text-align: center;/' /content/sample_data/index.html

Making the table headers slighly larger and removing pic and mapurl names all together.

In [54]:
!sed -i 's/rank/<font size="+1"\>Rank<\/font>/' /content/sample_data/index.html
!sed -i 's/name/<font size="+1"\>Restaurant<br>Name<\/font>/' /content/sample_data/index.html
!sed -i 's/review_count/<font size="+1"\># of <br>Reviews<\/font>/' /content/sample_data/index.html
!sed -i 's/rating/<font size="+1"\>Rating<\/font>/' /content/sample_data/index.html
!sed -i 's/pic/<font size="+1"\>\&nbsp;<\/font>/' /content/sample_data/index.html
!sed -i 's/mapurl/<font size="+1"\>\&nbsp;<\/font>/' /content/sample_data/index.html



Next, specify that the contents of the table be centered...

In [None]:
!sed -i 's/<tbody>/<tbody style="text-align: center;">/' /content/sample_data/index.html

Taking a look at a final version of this file...

In [55]:
!cat /content/sample_data/index.html

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: center;">
      <th><font size="+1">Rank</font></th>
      <th><font size="+1">Restaurant<br>Name</font></th>
      <th><font size="+1"># of <br>Reviews</font></th>
      <th><font size="+1">Rating</font></th>
      <th><font size="+1">&nbsp;</font></th>
      <th><font size="+1">&nbsp;</font></th>
    </tr>
  </thead>
  <tbody style="text-align: center;">
    <tr>
      <td>1</td>
      <td>Beno's Deli</td>
      <td>20</td>
      <td>4.5</td>
      <td><img src="https://s3-media1.fl.yelpcdn.com/bphoto/snAzXC5xLgybMzV8NxGqyw/o.jpg" width="100px" height="100px"></td>
      <td><a href="https://www.google.com/maps/place/78+E+4th+St+Winona,+MN+55987" target="_blank">Map</a></td>
    </tr>
    <tr>
      <td>2</td>
      <td>The Acoustic Cafe</td>
      <td>73</td>
      <td>4.0</td>
      <td><img src="https://s3-media1.fl.yelpcdn.com/bphoto/0yG-DARmJJGYlUK8pgBnHw/o.jpg" width="100px" height="100px"></td>
      <td