# Inference Pipeline

The inference pipeline is used to predict the sentiment classification for batch data. The pretrained model is downloaded from Hopsworks, and the most recent batch data is downloaded from HuggingFace. After prediction, the data is sorted based on the confidence of the prediction and re-saved on HuggingFace.

### Imports

In [1]:
import pandas as pd
import hopsworks
import joblib
from datetime import datetime
import requests
import numpy as np
from huggingface_hub import notebook_login
from datasets import load_dataset, Dataset
from sklearn import preprocessing as p


### Connect to Hopsworks and download model

In [2]:
project = hopsworks.login()
fs = project.get_feature_store()
mr = project.get_model_registry()

Copy your Api Key (first register/login): https://c.app.hopsworks.ai/account/api/generated
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/5321




Connected. Call `.close()` to terminate connection gracefully.
Connected. Call `.close()` to terminate connection gracefully.


### Load LSTM Model from Hopsworks

In [4]:
model = mr.get_model("headlines_sentiment_model", version=2)
model_dir = model.download()
model = joblib.load(model_dir + "/headlines_sentiment_model.pkl")
print(model.summary())

Downloading file ... 

2023-01-15 13:03:26.317233: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Keras model archive loading:
File Name                                             Modified             Size
config.json                                    2023-01-14 11:15:52         2496
metadata.json                                  2023-01-14 11:15:52           64
variables.h5                                   2023-01-14 11:15:52      4751728


2023-01-15 13:03:46.649713: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Keras weights file (<HDF5 file "variables.h5" (mode r)>) loading:
...layers
......dense
.........vars
............0
............1
......dropout
.........vars
......dropout_1
.........vars
......embedding
.........vars
............0
......lstm
.........cell
............vars
...............0
...............1
...............2
.........vars
...metrics
......mean
.........vars
............0
............1
......mean_metric_wrapper
.........vars
............0
............1
...optimizer
......vars
.........0
.........1
.........10
.........11
.........12
.........2
.........3
.........4
.........5
.........6
.........7
.........8
.........9
...vars
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 60, 40)            200000    
                                                                 
 dropout (Dropout)           (None, 60, 40)            0         
 

### Connect and load batch data from HuggingFace

In [5]:
# Conneect to HuggingFace
notebook_login()

# Load scraped batch data from HuggingFace
batch_data = load_dataset("eengel7/sentiment_analysis_batch",split='train')
batch_data = pd.DataFrame(data=batch_data)

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Downloading and preparing dataset parquet/eengel7--sentiment_analysis_batch to /Users/torileatherman/.cache/huggingface/datasets/eengel7___parquet/eengel7--sentiment_analysis_batch-524b74c4b113f844/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/14.0k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset parquet downloaded and prepared to /Users/torileatherman/.cache/huggingface/datasets/eengel7___parquet/eengel7--sentiment_analysis_batch-524b74c4b113f844/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.


### Predict sentiment of batch data and sort based on confidence

In [6]:
# Get classification probability for every label
y_predictions = model.predict(batch_data['Headline'].to_list())

# Normalize classification probabilities for each headline
y_predictions = p.normalize(y_predictions, norm ='l1')

# Set highest classification probability as label
batch_data['Prediction'] = y_predictions.argmax(axis=1)

# Sort batch data depending on classification confidence
batch_data['Confidence'] = np.amax(y_predictions, axis=1)
batch_data = batch_data.sort_values('Confidence', ascending=False)



### Reconstruct dataframe and upload

In [10]:
# Upload to HuggingFace
batch_predictions_dataset = Dataset.from_pandas(batch_data)
batch_predictions_dataset.push_to_hub("torileatherman/sentiment_analysis_batch_predictions")

Pushing dataset shards to the dataset hub:   0%|          | 0/1 [00:00<?, ?it/s]

Deleting unused files from dataset repository:   0%|          | 0/1 [00:00<?, ?it/s]

StopIteration: 