# Proof of Concept

This exists to decide whether the overall goal of this project is feasible or not. It will be doing some of the following tasks:
* Asking for user inputs for preference deciding
* Generating metric that represents a user
* Generating metric that represents a service
* Comparing user to services and finding best fit

In [53]:
import pandas as pd
import numpy as np
from pandasql import sqldf

In [25]:
movies = pd.read_csv("data/modified/movies_api_imdb_merged.csv")
genres = pd.read_csv("data/modified/service_genres_counted.csv")
ratings = pd.read_csv("data/modified/ratings_counted.csv")

In [39]:
list(movies.columns)

['title',
 'release_year',
 'type',
 'rating',
 'service',
 'tmdb_id',
 'genres',
 'imdb_id',
 'popularity',
 'tmdb_score',
 'tmdb_count',
 'poster_path',
 'budget',
 'revenue',
 'runtime',
 'tconst',
 'imdb_score',
 'imdb_count',
 'mean_score',
 'mean_num_votes']

In [38]:
list(genres.columns)

['service',
 'type',
 'genre',
 'count',
 'mean_score',
 'mean_popularity',
 'total_on_service',
 'percentage_of_total']

In [40]:
list(ratings.columns)

['service',
 'type',
 'count',
 'rating',
 'mean_score',
 'mean_popularity',
 'total_on_service',
 'percentage_of_total']

## Genres

Finding the "best" streaming service for a user based on genre preferences.

In [26]:
# Dictionary of genres and corresponding user-input values initialized to 0
user_genres = {key: 0 for key in genres.genre.unique()}

In [27]:
# A dictionary generated to provide an example response set for genres
example_response = {'Drama': 1,'Comedy': 1,'Thriller': 1,'Action': 1,'Romance': 0,'Horror': 1,'Crime': 0,'Documentary': 0,'Family': 0,'Adventure': 1,'TV Movie': 0,'Mystery': 1,'Science Fiction': 1,'Western': 1,'Fantasy': 0,'Music': 0,'History': 0,'War': 1,'Animation': 1}

# Uncomment this line to redo genre calculating
#example_response = None

In [28]:
# Ask the user whether they like each genre or not.
if(example_response == None):
    for genre in user_genres:
        ans = input(f"Do you like the {genre} genre? (y/n): ")
        if(ans == "y"):
            # Value is 1 if the user likes the genre and 0 otherwise
            user_genres[genre] += 1
else:
    user_genres = example_response

In [29]:
user_genres

{'Drama': 1,
 'Comedy': 1,
 'Thriller': 1,
 'Action': 1,
 'Romance': 0,
 'Horror': 1,
 'Crime': 0,
 'Documentary': 0,
 'Family': 0,
 'Adventure': 1,
 'TV Movie': 0,
 'Mystery': 1,
 'Science Fiction': 1,
 'Western': 1,
 'Fantasy': 0,
 'Music': 0,
 'History': 0,
 'War': 1,
 'Animation': 1}

In [68]:
# Summarize how a service performs for each genre and whether the user likes the genre or not
genre_service_summaries = {key:[] for key in genres.service.unique()}
for index, row in genres.iterrows():
    # Append (genre's user score)*(genre's average score on service)*(genre's percentage of total on service)
    genre_service_summaries[row["service"]] += [user_genres.get(row["genre"])*row["mean_score"]*row["percentage_of_total"]]

In [69]:
# Generate one-number genre summary for each service by calculating sum of all genre values 
# Sum chosen because scores of 0 should not impact choice considering that the genre can be ignored on service
for key, value in genre_service_summaries.items():
    genre_service_summaries[key] = np.sum(value)

In [70]:
genre_service_summaries

{'amazon': 4.0411956663620945,
 'disney': 3.620682730923695,
 'hbo': 4.193505929997107,
 'hulu': 4.314811529933482,
 'netflix': 4.276764138491216}

In [71]:
# Ideal service based on genres alone is the one with the highest median genre score
print(max(genre_service_summaries, key=genre_service_summaries.get))

hulu
