## Before your start:

- Read the README.md file
- Follow each step as described in the instructions and take notes of the issues you find along the way
- Happy learning!

In [1]:
# Start by importing the table and the necessary libraries.

from bs4 import BeautifulSoup as bs
from pandas.io.json import json_normalize
import pandas as pd
import requests
import re
import IPython

from IPython.display import Audio

Audio('GoT_theme.mp3')

# Game of APIs: A song of Web and Scraping

In this notebook we will be working with:
- [TVMAZE](http://www.tvmaze.com/api) - a free, fast and clean REST API that's easy to use, returns JSON containing lots of information about almost any TV show - and 
- [IMDB](https://www.imdb.com/title/tt0944947/episodes) - an online database of information related to films, television programs, home videos, video games, and streaming content online.

The goal was to collect relevant information about a TV show called `Game of Thrones` that aired between 2011–2019 and currently sits as the second highest rated television show on IMDB with 9.3/10, only after Breaking Bad (9.5/10).

The focus of this project was to collect data regarding each episode and season and cross it with its public rating on IMDB.

## Step 1: API REQUEST: 

- using the TVMAZE API, it is possible to search through all the shows in their database by the show's name
- since the objective is to gather data about each episode, we will use the call the show's main information and its episode list in one single response


### 1.1 Getting the data from the API 


In [2]:
got = requests.get(url='http://api.tvmaze.com/singlesearch/shows?q=game-of-thrones&embed=episodes')
 
# Converting it to JSON

data_json = got.json()

### 1.2 Passing the data to a DataFrame

In [3]:
# Since the data is compiled in nested dictionaries, we need to normalize the JSON

data_normalize = json_normalize(data=data_json['_embedded'],record_path='episodes')

data_norm = data_normalize[['id','url','name','season','number','airdate', 'airtime',
                            'airstamp','runtime', 'summary']]


### 1.3 Data Cleaning 

In [24]:
# Resizing and cleaning the 'Summary' column so it does not include any HTML characters

pd.set_option('display.max_colwidth', -1)

data = data_norm.replace(to_replace =['<p>','</p>'], value = '', regex = True) 

# Removing a few columns that add very little value to the dataframe

not_cool_cols = ['id', 'airtime', 'airstamp',]

data = data.drop(not_cool_cols, axis=1)

data.to_csv("./data_api.csv", index = False)

data.head()


Unnamed: 0,url,name,season,number,airdate,runtime,summary
0,http://www.tvmaze.com/episodes/4952/game-of-thrones-1x01-winter-is-coming,Winter is Coming,1,1,2011-04-17,60,"Lord Eddard Stark, ruler of the North, is summoned to court by his old friend, King Robert Baratheon, to serve as the King's Hand. Eddard reluctantly agrees after learning of a possible threat to the King's life. Eddard's bastard son Jon Snow must make a painful decision about his own future, while in the distant east Viserys Targaryen plots to reclaim his father's throne, usurped by Robert, by selling his sister in marriage."
1,http://www.tvmaze.com/episodes/4953/game-of-thrones-1x02-the-kingsroad,The Kingsroad,1,2,2011-04-24,60,"An incident on the Kingsroad threatens Eddard and Robert's friendship. Jon and Tyrion travel to the Wall, where they discover that the reality of the Night's Watch may not match the heroic image of it."
2,http://www.tvmaze.com/episodes/4954/game-of-thrones-1x03-lord-snow,Lord Snow,1,3,2011-05-01,60,Jon Snow attempts to find his place amongst the Night's Watch. Eddard and his daughters arrive at King's Landing.
3,http://www.tvmaze.com/episodes/4955/game-of-thrones-1x04-cripples-bastards-and-broken-things,"Cripples, Bastards, and Broken Things",1,4,2011-05-08,60,Tyrion stops at Winterfell on his way home and gets a frosty reception from Robb Stark. Eddard's investigation into the death of his predecessor gets underway.
4,http://www.tvmaze.com/episodes/4956/game-of-thrones-1x05-the-wolf-and-the-lion,The Wolf and the Lion,1,5,2011-05-15,60,Catelyn's actions on the road have repercussions for Eddard. Tyrion enjoys the dubious hospitality of the Eyrie.


## Step 2: Web Scraping:

- resorting to each season's episode list, it is possible to consult the rating each episode got
- using a list of all seasons, we will be able to run a script that collect the elements from all the episodes in each season


In [5]:
# Array of all episodes from all seasons with the rating score and total # of ratings

episodes_list = ['https://www.imdb.com/title/tt0944947/episodes?season=1',
                  'https://www.imdb.com/title/tt0944947/episodes?season=2',
                  'https://www.imdb.com/title/tt0944947/episodes?season=3',
                  'https://www.imdb.com/title/tt0944947/episodes?season=4',
                  'https://www.imdb.com/title/tt0944947/episodes?season=5',
                  'https://www.imdb.com/title/tt0944947/episodes?season=6',
                  'https://www.imdb.com/title/tt0944947/episodes?season=7',
                  'https://www.imdb.com/title/tt0944947/episodes?season=8']


### 2.1 The most interesting elements to collect are `TITLE`, `RATING` and `TOTAL VOTES`

- By accessing the HTML code of the page and inspecting the elements, it is clear they are in this particular section


In [6]:
'''
Title: div class = 'info'
       a href = 'link' title= 'Winter is Coming'

 Rating: div class = 'ipl-rating-widget'
         div class = 'ipl-rating-star small'
             span class = 'ipl-rating-star_rating' 9.0
             span class = 'ipl-rating-star_total-votes' (38,179)
''' 

"\nTitle: div class = 'info'\n       a href = 'link' title= 'Winter is Coming'\n\n Rating: div class = 'ipl-rating-widget'\n         div class = 'ipl-rating-star small'\n             span class = 'ipl-rating-star_rating' 9.0\n             span class = 'ipl-rating-star_total-votes' (38,179)\n"

### 2.2 Scraping each of the elements from each page and adding them to a list

In [7]:
# Creating a loop that will scrape all the pages and collect the data from each episode

rating_data = []

for episode in episodes_list:
    response = requests.get(episode)
    soup = bs(response.content, 'html.parser')
    
    # this is the class that holds all the elements we need: title, ipl-rating-star_rating and ipl-rating-star_total-votes
    
    episode_rating = soup.find_all('div', class_ = 'info')
    rating_data.append(episode_rating)

    # this loop looks for the elements that we need inside the list rating_data

    for i in rating_data:
        titles = re.findall('title=".+"',str(rating_data))
        ratings = re.findall('rating">\d+[.]\d+',str(rating_data))
        total_votes = re.findall('total-votes">[(]\d+,\d+[)]',str(rating_data))                    
        

### 2.3 Data Cleaning the results we've gotten from scraping the HTML code

In [8]:
# Cleaning the scrapped values inside each list

titles = [i.replace('title="', '').replace('"','') for i in titles]
ratings = [j.replace('rating">', '') for j in ratings]
total_votes = [k.replace('total-votes">(','').replace(')','') for k in total_votes]


### 2.4 Creating a new DataFrame with the data collected from Web Scrapping

In [25]:
# Dictionary of lists with clean elements 

dict = {'titles':titles,
        'ratings': ratings,
        'total_votes':total_votes}
 
# Creating a dataframe from the lists we created

df_ratings = pd.DataFrame(dict)

df_ratings.to_csv("./data_HTML.csv", index = False)

df_ratings

Unnamed: 0,titles,ratings,total_votes
0,Winter Is Coming,9.1,38172
1,The Kingsroad,8.8,28935
2,Lord Snow,8.7,27356
3,"Cripples, Bastards, and Broken Things",8.8,25959
4,The Wolf and the Lion,9.1,27020
...,...,...,...
68,A Knight of the Seven Kingdoms,7.9,119094
69,The Long Night,7.5,199155
70,The Last of the Starks,5.5,151814
71,The Bells,6.0,176805


## Step 3: Merging and analysing the data

In [10]:
# Merging data from the API (DF1) and df_rating from Web Scraping (DF2) on the lkey and rkey columns 

final_df = data.merge(df_ratings, left_on='name', right_on='titles')

# This allows us to check if each episode number and season correctly match their title

final_df.head()

Unnamed: 0,url,name,season,number,airdate,runtime,summary,titles,ratings,total_votes
0,http://www.tvmaze.com/episodes/4953/game-of-thrones-1x02-the-kingsroad,The Kingsroad,1,2,2011-04-24,60,"An incident on the Kingsroad threatens Eddard and Robert's friendship. Jon and Tyrion travel to the Wall, where they discover that the reality of the Night's Watch may not match the heroic image of it.",The Kingsroad,8.8,28935
1,http://www.tvmaze.com/episodes/4954/game-of-thrones-1x03-lord-snow,Lord Snow,1,3,2011-05-01,60,Jon Snow attempts to find his place amongst the Night's Watch. Eddard and his daughters arrive at King's Landing.,Lord Snow,8.7,27356
2,http://www.tvmaze.com/episodes/4955/game-of-thrones-1x04-cripples-bastards-and-broken-things,"Cripples, Bastards, and Broken Things",1,4,2011-05-08,60,Tyrion stops at Winterfell on his way home and gets a frosty reception from Robb Stark. Eddard's investigation into the death of his predecessor gets underway.,"Cripples, Bastards, and Broken Things",8.8,25959
3,http://www.tvmaze.com/episodes/4956/game-of-thrones-1x05-the-wolf-and-the-lion,The Wolf and the Lion,1,5,2011-05-15,60,Catelyn's actions on the road have repercussions for Eddard. Tyrion enjoys the dubious hospitality of the Eyrie.,The Wolf and the Lion,9.1,27020
4,http://www.tvmaze.com/episodes/4957/game-of-thrones-1x06-a-golden-crown,A Golden Crown,1,6,2011-05-22,60,Viserys is increasingly frustrated by the lack of progress towards gaining his crown.,A Golden Crown,9.2,26744


### 3.1 Data Cleaning of the final DataFrame

In [11]:
# Making sure that for each scraped episode, the DF always matches the column "name" 

final_df = final_df.loc[lambda final_df: final_df['name'] == final_df['titles']]

# Dropping the unecessary column 'name' since we've checked that the episodes titles match each row

final_df = final_df.drop(['name'],axis=1)

# Exporting the final DataFrame with the data collected as a CSV

final_df.to_csv("./got_final.csv", index = False)
final_df

Unnamed: 0,url,season,number,airdate,runtime,summary,titles,ratings,total_votes
0,http://www.tvmaze.com/episodes/4953/game-of-thrones-1x02-the-kingsroad,1,2,2011-04-24,60,"An incident on the Kingsroad threatens Eddard and Robert's friendship. Jon and Tyrion travel to the Wall, where they discover that the reality of the Night's Watch may not match the heroic image of it.",The Kingsroad,8.8,28935
1,http://www.tvmaze.com/episodes/4954/game-of-thrones-1x03-lord-snow,1,3,2011-05-01,60,Jon Snow attempts to find his place amongst the Night's Watch. Eddard and his daughters arrive at King's Landing.,Lord Snow,8.7,27356
2,http://www.tvmaze.com/episodes/4955/game-of-thrones-1x04-cripples-bastards-and-broken-things,1,4,2011-05-08,60,Tyrion stops at Winterfell on his way home and gets a frosty reception from Robb Stark. Eddard's investigation into the death of his predecessor gets underway.,"Cripples, Bastards, and Broken Things",8.8,25959
3,http://www.tvmaze.com/episodes/4956/game-of-thrones-1x05-the-wolf-and-the-lion,1,5,2011-05-15,60,Catelyn's actions on the road have repercussions for Eddard. Tyrion enjoys the dubious hospitality of the Eyrie.,The Wolf and the Lion,9.1,27020
4,http://www.tvmaze.com/episodes/4957/game-of-thrones-1x06-a-golden-crown,1,6,2011-05-22,60,Viserys is increasingly frustrated by the lack of progress towards gaining his crown.,A Golden Crown,9.2,26744
...,...,...,...,...,...,...,...,...,...
65,http://www.tvmaze.com/episodes/1623964/game-of-thrones-8x02-a-knight-of-the-seven-kingdoms,8,2,2019-04-21,60,Jaime faces judgement and Winterfell prepares for the battle to come.,A Knight of the Seven Kingdoms,7.9,119094
66,http://www.tvmaze.com/episodes/1623965/game-of-thrones-8x03-the-long-night,8,3,2019-04-28,90,Winterfell fights the Army of the Dead.,The Long Night,7.5,199155
67,http://www.tvmaze.com/episodes/1623966/game-of-thrones-8x04-the-last-of-the-starks,8,4,2019-05-05,78,The survivors plan their next steps; Cersei makes a power move.,The Last of the Starks,5.5,151814
68,http://www.tvmaze.com/episodes/1623967/game-of-thrones-8x05-the-bells,8,5,2019-05-12,79,"Varys betrays his queen, and Daenerys brings her forces to King's Landing.",The Bells,6.0,176805


### 3.2 Analysis of the results

In [12]:
# Calculating the average runtime per episode or "the typical lenght of an episode"

avg_runtime = round(final_df.runtime.mean(),1)

# Calculating the average rating per episode

final_df['ratings'] = pd.to_numeric(final_df['ratings'], downcast="float")
avg_rating = round(final_df.ratings.mean())

# Calculating the average number of ratings per episode

final_df['total_votes'] = [i.replace(',','.') for i in final_df['total_votes']]
final_df['total_votes'] = pd.to_numeric(final_df['total_votes'], downcast="float")
avg_votes = round(final_df.total_votes.mean())


print(f'The average runtime of an episode is: {avg_runtime} minutes')
print(f'The average rating of an episode is: {avg_rating}/10')
print(f'The average number of ratings per episode is: {avg_votes}00 thousand votes')

The average runtime of an episode is: 61.4 minutes
The average rating of an episode is: 9.0/10
The average number of ratings per episode is: 48.000 thousand votes


In [23]:
# Checking the average rating per season

rating_season = final_df.groupby(['season'])['ratings'].mean()
rating_season

season
1    9.100000
2    8.977777
3    8.977777
4    9.300000
5    8.830000
6    9.059999
7    9.128572
8    6.433333
Name: ratings, dtype: float32

In [16]:
# Checking which are the top 5 best rated episodes by the public

top5_best = final_df.nlargest(5, ['ratings']) 
top5_best.to_csv("./top5_best.csv", index = False)

top5_best

Unnamed: 0,url,season,number,airdate,runtime,summary,titles,ratings,total_votes
25,http://www.tvmaze.com/episodes/4980/game-of-thrones-3x09-the-rains-of-castamere,3,9,2013-06-02,60,"Robb presents himself to Walder Frey, and Edmure meets his bride. Jon faces his harshest test yet. Bran discovers a new gift. Daario and Jorah debate how to take Yunkai. House Frey joins with House Tully.",The Rains of Castamere,9.9,85.853996
44,http://www.tvmaze.com/episodes/155299/game-of-thrones-5x08-hardhome,5,8,2015-05-31,60,Arya makes progress in her training. Sansa confronts an old friend. Cersei struggles. Jon travels.<br><br>,Hardhome,9.9,86.397003
55,http://www.tvmaze.com/episodes/729574/game-of-thrones-6x09-battle-of-the-bastards,6,9,2016-06-19,60,Ramsay surprises his audience. Jon retaliates. Dany is true to her word.,Battle of the Bastards,9.9,183.936005
56,http://www.tvmaze.com/episodes/729575/game-of-thrones-6x10-the-winds-of-winter,6,10,2016-06-26,69,"Alliances are made, the High Sparrow is holding trials at King's Landing, Daenerys is sailing for the Seven Kingdoms and a new King of the North is crowned.",The Winds of Winter,9.9,127.778999
60,http://www.tvmaze.com/episodes/1221412/game-of-thrones-7x04-the-spoils-of-war,7,4,2017-08-06,60,Arya gets to the final destination. Daenerys takes it upon herself to strike back.,The Spoils of War,9.8,78.445


In [17]:
# Checking which are the top 5 worst rated episodes by the public

top5_worst = final_df.nsmallest(5, ['ratings'])
top5_worst.to_csv("./top5_worst.csv", index = False)

top5_worst

Unnamed: 0,url,season,number,airdate,runtime,summary,titles,ratings,total_votes
69,http://www.tvmaze.com/episodes/1623968/game-of-thrones-8x06-the-iron-throne,8,6,2019-05-19,80,"In the aftermath of the devastating attack on King's Landing, Daenerys must face the survivors.",The Iron Throne,4.1,218.156998
67,http://www.tvmaze.com/episodes/1623966/game-of-thrones-8x04-the-last-of-the-starks,8,4,2019-05-05,78,The survivors plan their next steps; Cersei makes a power move.,The Last of the Starks,5.5,151.813995
68,http://www.tvmaze.com/episodes/1623967/game-of-thrones-8x05-the-bells,8,5,2019-05-12,79,"Varys betrays his queen, and Daenerys brings her forces to King's Landing.",The Bells,6.0,176.804993
66,http://www.tvmaze.com/episodes/1623965/game-of-thrones-8x03-the-long-night,8,3,2019-04-28,90,Winterfell fights the Army of the Dead.,The Long Night,7.5,199.154999
64,http://www.tvmaze.com/episodes/1590943/game-of-thrones-8x01-winterfell,8,1,2019-04-14,60,"Arriving at Winterfell, Jon and Daenerys struggle to unite a divided North. Jon gets some big news.",Winterfell,7.6,121.011002
