<a href="https://colab.research.google.com/github/zaenulhilmi/movie_revenue/blob/master/movie_revenue.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Movie Revenue
The project is a prediction of movie revenue using regression model.

##Dataset
It used dataset from https://www.kaggle.com/tmdb/tmdb-movie-metadata



With a label of **revenue**, We pick only certain feature:
*   popularity
*   budget
*   vote_average
*   vote_count



##Code


###Importing package


In [0]:
import matplotlib.pyplot as plt
import pandas as pd

import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers

###Preparation Functions


In [0]:
def build_model():
    model = keras.Sequential([
        layers.Dense(64, activation='relu', input_shape=[len(train_dataset.keys())]),
        layers.Dense(64, activation='relu'),
        layers.Dense(1)
    ])

    optimizer = tf.keras.optimizers.RMSprop(0.001)

    model.compile(loss='mse',
                  optimizer=optimizer,
                  metrics=['mae', 'mse'])
    return model


def normalize(x):
    return (x - train_stats['mean']) / train_stats['std']

###Load and Prepare Training and Test Dataset

In [0]:
raw_dataset = pd.read_csv('https://raw.githubusercontent.com/zaenulhilmi/movie_revenue/master/datasets/movies.csv')
dataset = raw_dataset.copy()
train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)
train_labels = train_dataset.pop('revenue')
test_labels = test_dataset.pop('revenue')

### Normalize Dataset


In [0]:
normed_train_data = normalize(train_dataset)
normed_test_data = normalize(test_dataset)


###Build Model and Train


In [0]:
model = build_model()

EPOCHS = 1000

history = model.fit(
    normed_train_data[['popularity', 'budget', 'vote_average', 'vote_count']], train_labels,
    epochs=EPOCHS, validation_split=0.2, verbose=0)

###Logging for History Info and Loss, MAE, and MSE


In [0]:
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
print(hist.tail())
loss, mae, mse = model.evaluate(normed_test_data[['popularity', 'budget', 'vote_average', 'vote_count']], test_labels,
                                verbose=2)
print(loss, mae, mse)

```
         val_loss  val_mean_absolute_error  ...  mean_squared_error  epoch
995  8.037158e+15             5.260173e+07  ...        9.034380e+15    995
996  8.036487e+15             5.259795e+07  ...        9.033256e+15    996
997  8.035866e+15             5.259652e+07  ...        9.032305e+15    997
998  8.035041e+15             5.259552e+07  ...        9.031515e+15    998
999  8.034494e+15             5.259630e+07  ...        9.030478e+15    999

[5 rows x 7 columns]
1.830084877816003e+16 65934573.11111111 1.830084877816003e+16
```

###Showing Plot for Prediction and True Value of Revenue


In [0]:
test_predictions = model.predict(normed_test_data[['popularity', 'budget', 'vote_average', 'vote_count']]).flatten()
a = plt.axes(aspect='equal')
plt.scatter(test_labels, test_predictions)
plt.xlabel('True Values [revenue]')
plt.ylabel('Predictions [revenue]')
lims = [0, 1000000000]
plt.xlim(lims)
plt.ylim(lims)
_ = plt.plot(lims, lims)
plt.show()


![revenue plot](https://raw.githubusercontent.com/zaenulhilmi/movie_revenue/master/images/plot_revenue.png)