# Using a Neural Network approach to see whether an author's name constitutes a critical/commercial hit or flop:


In [1]:
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import functions
from sklearn.preprocessing import LabelEncoder

Checking the amount of unique authors in the dataset. Credit to ChatGPT for showing how to do so:


In [2]:
df_metadata = functions.get_data()
df_metadata['author_name'].nunique()

12877

Over 12,000 authors. Statiscally, most authors have 1 book, so I will check if that is the case. Credit to ChatGPT which showed me how to code up a solution to this:

In [3]:
df_metadata['author_name'].value_counts().describe()

count    12877.000000
mean         1.553157
std          1.800488
min          1.000000
25%          1.000000
50%          1.000000
75%          1.000000
max         76.000000
Name: count, dtype: float64

This shows that the majority of authors have only 1 book to their names making it likely that they are debut authors. Lastly, it can be shown that the maximum number of books an author has got on the Amazon storefront is 76. This shows that author name alone is not enough to determine a critical or commercial success. As such, my neural network will use the context of this information to determine whether the general audience would support a given book both financially and critically. 

## The Foundations of the Neural Network:

Credit to ChatGPT for showing me how to lay the groundwork for my neural network:

In [None]:
# Start with encoding each author's name:
le = LabelEncoder()
df_metadata['author_id'] = le.fit_transform(df_metadata['author_name'])

# Other contexts around each author:
df_metadata['author_book_count'] = df_metadata.groupby('author_name')['title'].transform('count')
df_metadata['debut'] = (df_metadata['author_book_count'] == 1).astype(int)

# Now to deal with whether a title is a success or not, be it critical or commercial:
earnings = np.log1p(df_metadata['rating_number'])
reviews = df_metadata['average_rating']
impact = 0.6 * earnings + 0.4 * reviews
df_metadata['is_supported'] = (impact > impact.quantile(0.7)).astype(int)

x = df_metadata['author_id']
y = df_metadata['is_supported']

In [23]:
df_metadata.columns

Index(['author_name', 'publisher', 'publisher_date', 'format', 'page_count',
       'language', 'category_level_2_sub', 'category_level_3_detail',
       'average_rating', 'rating_number', 'price_numeric', 'maybe_date',
       'author_id'],
      dtype='object')