# Inference Pipeline

The inference pipeline is used to predict the sentiment classification for batch data. The pretrained model is downloaded from Hopsworks, and the most recent batch data is downloaded from HuggingFace. After prediction, the data is sorted based on the confidence of the prediction and re-saved on HuggingFace.

### Imports

In [1]:
import pandas as pd
import hopsworks
import joblib
from datetime import datetime
import requests
import numpy as np
from huggingface_hub import notebook_login
from datasets import load_dataset, Dataset
from sklearn import preprocessing as p


### Connect to Hopsworks and download model

In [2]:
project = hopsworks.login()
fs = project.get_feature_store()
mr = project.get_model_registry()

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/5322




Connected. Call `.close()` to terminate connection gracefully.
Connected. Call `.close()` to terminate connection gracefully.


### Load LSTM Model from Hopsworks

In [3]:
model = mr.get_model("headlines_sentiment_model", version=1)
model_dir = model.download()
model = joblib.load(model_dir + "/headlines_sentiment_model.pkl")
print(model.summary())

Downloading file ... 

2023-01-15 18:32:17.465167: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Keras model archive loading:
File Name                                             Modified             Size
config.json                                    2023-01-15 17:44:48         6798
metadata.json                                  2023-01-15 17:44:48           64
variables.h5                                   2023-01-15 17:44:48     16492456


2023-01-15 18:32:33.881335: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Keras weights file (<HDF5 file "variables.h5" (mode r)>) loading:
...layers
......bidirectional
.........backward_layer
............cell
...............vars
..................0
..................1
..................2
............vars
.........forward_layer
............cell
...............vars
..................0
..................1
..................2
............vars
.........layer
............cell
...............vars
............vars
.........vars
......bidirectional_1
.........backward_layer
............cell
...............vars
..................0
..................1
..................2
............vars
.........forward_layer
............cell
...............vars
..................0
..................1
..................2
............vars
.........layer
............cell
...............vars
............vars
.........vars
......dense
.........vars
............0
............1
......dense_1
.........vars
............0
............1
......dense_2
.........vars
............0
............1


### Connect and load batch data from HuggingFace

In [4]:
# Conneect to HuggingFace
notebook_login()

# Load scraped batch data from HuggingFace
batch_data = load_dataset("eengel7/sentiment_analysis_batch",split='train')
batch_data = pd.DataFrame(data=batch_data)

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…



### Predict sentiment of batch data and sort based on confidence

In [5]:
# Get classification probability for every label
y_predictions = model.predict(batch_data['Headline'].to_list())

# Normalize classification probabilities for each headline
y_predictions = p.normalize(y_predictions, norm ='l1')

# Set highest classification probability as label
batch_data['Prediction'] = y_predictions.argmax(axis=1)

# Sort batch data depending on classification confidence
batch_data['Confidence'] = np.amax(y_predictions, axis=1)
batch_data = batch_data.sort_values('Confidence', ascending=False)



### Reconstruct dataframe and upload

In [6]:
# Upload to HuggingFace
batch_predictions_dataset = Dataset.from_pandas(batch_data)
batch_predictions_dataset.push_to_hub("torileatherman/sentiment_analysis_batch_predictions")

Pushing dataset shards to the dataset hub:   0%|          | 0/1 [00:00<?, ?it/s]

StopIteration: 