# Importing our data from TMDB using API calls

## "Discover" Data Set

First we are getting our BIG set of movie data using the Discover API, with the following parameters:
    * include_adult false (does not include adult movies in the response)
    * include_video false (does not include short format video in the response)
    * page * where we are going to make calls to the maximum allowed number of pages which is 500
    * primary_release_date_gte 1999-12-31 (we want movie data from the last 20 years only)
    * primary_release_date_lte 2019-12-31 (this is a pre-covid case study)
    * vote_count.gte 31 (we want the movies to have at least 31 votes to be included)
    * with_original_language en (our client will make movies in English, and we will still have 10,000 returns)
    
The TMDB database allows for a maximum number of 10,000 returns, so we are trying to use parameter rules that will give us only the most relevant and important returns in order to make recommendations.

https://developers.themoviedb.org/3/discover/movie-discover

First loading in the libraries we will need to make these calls.

In [10]:
import json
import requests

Our API key is stored in our .secret folder in a json file. We're accessing that here and saving it to our api-key variable

In [34]:
def get_keys(path):
    '''docstring here'''
    with open(path) as f:
        return json.load(f)

keys = get_keys("/Users/Wadkins/Dropbox/Flatiron/module_01/.secret/tmdb_api.json")

api_key = keys['api_key']

In [7]:
response={}

for i in range (1,501):
    '''docstring here'''
    url = 'https://api.themoviedb.org/3/discover/movie?api_key={}&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&page={}&primary_release_date.gte=1999-12-31&primary_release_date.lte=2019-12-31&vote_count.gte=31&with_original_language=en'.format(api_key, i)
    response[i] = (requests.get(url).json())

In [8]:
with open('tmdb_movies.json', 'w', encoding='utf-8') as f:
    json.dump(response, f, ensure_ascii=False, indent=4)

It's worth noting that we had 10,000 returns with our search parameters, so we might want to tighten those up a bit!

## IMDB ID Matchup

In [35]:
import pandas as pd
data = pd.read_csv('api_data/tmdb_discover.csv')

data

Unnamed: 0,popularity,vote_count,id,genre_ids,title,vote_average,release_date
0,1043.179,12572,354912,"[16, 10751, 35, 12, 14, 10402]",Coco,8.2,2017-10-27
1,437.037,5113,474350,"[27, 14]",It Chapter Two,6.9,2019-09-04
2,369.335,15356,475557,"[80, 53, 18]",Joker,8.2,2019-10-02
3,288.743,6330,330457,"[16, 10751, 12, 35, 14]",Frozen II,7.3,2019-11-20
4,253.076,3714,316727,"[28, 27, 53]",The Purge: Election Year,6.4,2016-06-29
...,...,...,...,...,...,...,...
9995,4.985,32,12855,"[35, 18, 10749]",Blue State,6.0,2007-04-27
9996,4.984,40,41488,"[18, 53]",The Statement,5.7,2003-12-12
9997,4.984,32,27711,[27],Killjoy,2.7,2000-10-24
9998,4.983,31,521190,"[18, 10749, 10770]",The Beach House,6.2,2018-04-28


In [36]:
data = data.head(10)

In [37]:
data

Unnamed: 0,popularity,vote_count,id,genre_ids,title,vote_average,release_date
0,1043.179,12572,354912,"[16, 10751, 35, 12, 14, 10402]",Coco,8.2,2017-10-27
1,437.037,5113,474350,"[27, 14]",It Chapter Two,6.9,2019-09-04
2,369.335,15356,475557,"[80, 53, 18]",Joker,8.2,2019-10-02
3,288.743,6330,330457,"[16, 10751, 12, 35, 14]",Frozen II,7.3,2019-11-20
4,253.076,3714,316727,"[28, 27, 53]",The Purge: Election Year,6.4,2016-06-29
5,250.205,4918,512200,"[12, 35, 14]",Jumanji: The Next Level,7.0,2019-12-04
6,246.616,20038,299536,"[12, 28, 878]",Avengers: Infinity War,8.3,2018-04-25
7,246.301,95,640882,[878],3022,5.7,2019-11-22
8,245.291,4276,384018,"[28, 12, 35]",Fast & Furious Presents: Hobbs & Shaw,6.9,2019-08-01
9,244.868,295,420634,"[53, 27]",Terrifier,6.2,2016-10-15


In [44]:
response={}

for row in data:
    '''docstring here'''
    i = data.loc[row, 'id']
    print(i)
    #url = 'https://api.themoviedb.org/3/movie/{}/external_ids?api_key={}'.format(i, api_key)
    #response = (requests.get(url).json())
    #print(response)
    #data['id'] = response['imdb_id']

KeyError: 'popularity'

In [19]:
data

Unnamed: 0,popularity,vote_count,id,genre_ids,title,vote_average,release_date,imdb_id
0,1043.179,12572,354912,"[16, 10751, 35, 12, 14, 10402]",Coco,8.2,2017-10-27,420634
1,437.037,5113,474350,"[27, 14]",It Chapter Two,6.9,2019-09-04,420634
2,369.335,15356,475557,"[80, 53, 18]",Joker,8.2,2019-10-02,420634
3,288.743,6330,330457,"[16, 10751, 12, 35, 14]",Frozen II,7.3,2019-11-20,420634
4,253.076,3714,316727,"[28, 27, 53]",The Purge: Election Year,6.4,2016-06-29,420634
5,250.205,4918,512200,"[12, 35, 14]",Jumanji: The Next Level,7.0,2019-12-04,420634
6,246.616,20038,299536,"[12, 28, 878]",Avengers: Infinity War,8.3,2018-04-25,420634
7,246.301,95,640882,[878],3022,5.7,2019-11-22,420634
8,245.291,4276,384018,"[28, 12, 35]",Fast & Furious Presents: Hobbs & Shaw,6.9,2019-08-01,420634
9,244.868,295,420634,"[53, 27]",Terrifier,6.2,2016-10-15,420634
