# Predicting categories of new posts
Now we have built a model and evaluated its performance, it's now time to put it to use. We can download the latest posts from Reddit and predict what the users are talking about.

In [None]:
from libs import *

In [None]:
vectorizer_pipe = joblib.load('trained_models/vectorizer_pipe.pkl')
rf_clf = joblib.load('trained_models/random_forest_classifier.pkl')
svc = joblib.load('trained_models/linear_svc.pkl')

We use praw to scrape the latest posts (since `2020-08-28`)

In [None]:
pred_data = pd.read_csv('datasets/reddit_scrape_latest.csv')

In [None]:
from helpers import DatasetCreator
creator = DatasetCreator(train=False)
data = creator.transform(pred_data)

Transform data using the `vectorizer_pipe`

In [None]:
X_pred = vectorizer_pipe.transform(data)

Make predictions: we use both the `LinearSVC` model and the `Random Forest` model. We noted that SVC gave a better performance on the test set.

In [None]:
data['svc_pred'] = svc.predict(X_pred)
data['rf_pred'] = rf_clf.predict(X_pred)

### Predictions
We can look at these predictions and see what the models predicted, and how well they match what our perception of the categories should be.

In [None]:
data['text'][16]

In [None]:
data[['text', 'svc_pred', 'rf_pred']][:10]

In [None]:
import matplotlib.pyplot as plt

def plot_prob_bars(model, X_pred, row):
    """Plot prediction probabilities for RF models"""
    pred_proba = model.predict_proba(X_pred)[row]
    f, ax = plt.subplots(figsize=(10, 7))
    ax.bar(range(len(pred_proba)), pred_proba, tick_label=model.classes_)
    ax.set_ylabel('Predicted probability')
    ax.set_xlabel('Predicted Class')
    plt.xticks(rotation=45);

In [None]:
row = 33
print(data['text'][row])
plot_prob_bars(rf_clf, X_pred, row)

### Statistics
Proportions of different categories of posts since 2020-08-28 (Our training data was until 08/27)

In [None]:
f, ax = plt.subplots(figsize=(10, 6))
pd.DataFrame(data['svc_pred'].value_counts()).plot.bar(ax=ax)
ax.set_ylabel('Counts')
ax.set_title('Number of posts since 2020-08-28')
plt.xticks(rotation=45);