<div>
<img src=https://www.institutedata.com/wp-content/uploads/2019/10/iod_h_tp_primary_c.svg width="300">
</div>

# Lab 10.2 - Deployment via Streamlit

### Introduction

**Note**: This notebook should work on your local machine.

The purpose of this lab is to take you through the process of deploying a machine learning web app on a publicly hosted platform (streamlit.io and optionally render.com). A trained model will be created using the Scikit-learn pipeline (combining loading, preprocessing and training steps), then separate files of Python code and text will need to be completed to make deployment possible. Firstly the app will be deployed to your local machine (so that you can view it in your browser). Once that it is successful, the files will be uploaded to a new repository you create in GitHub and then Streamlit or Render will read from this to host the application via a publicly accessible URL.

The app will take in a text string from a user and output a prediction of whether that string is expressing positive or negative sentiment. The model is created using methods from Module 8 (Natural Language Processing). Since the training data used to create the model is small (300 records), the prediction may only be accurate around 70% of the time. In future you may wish to improve this app's performance or develop your own app in a similar manner.

The following files are needed to create the app (they should be in the same folder as this notebook):

- requirements.txt
- app.py
- model.joblib
- utils.py
- .streamlit/ (folder containing config.toml)


Firstly we will see how a predictive model can be created as a pipe which combines the preprocessing, feature engineering and model training steps. This model is then saved as a joblib pickle file which can be reloaded at any time to avoid retraining.

This trained model can be loaded within your production environment along with required packages and real-time predictions can be made by calling its predict() method.

Streamlit enables apps to be deployed rapidly with minimal knowledge of HTML or CSS. Some of the key concepts are described at https://docs.streamlit.io/get-started/fundamentals/main-concepts. Sample apps can be seen at https://streamlit.io/gallery.

### Model Training and Testing

In [5]:
!pip3 install --upgrade streamlit

Collecting streamlit
  Downloading streamlit-1.39.0-py2.py3-none-any.whl.metadata (8.5 kB)
Collecting altair<6,>=4.0 (from streamlit)
  Downloading altair-5.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting blinker<2,>=1.0.0 (from streamlit)
  Downloading blinker-1.8.2-py3-none-any.whl.metadata (1.6 kB)
Collecting pyarrow>=7.0 (from streamlit)
  Downloading pyarrow-17.0.0-cp310-cp310-win_amd64.whl.metadata (3.4 kB)
Collecting tenacity<10,>=8.1.0 (from streamlit)
  Downloading tenacity-9.0.0-py3-none-any.whl.metadata (1.2 kB)
Collecting toml<2,>=0.10.1 (from streamlit)
  Downloading toml-0.10.2-py2.py3-none-any.whl.metadata (7.1 kB)
Collecting gitpython!=3.1.19,<4,>=3.0.7 (from streamlit)
  Downloading GitPython-3.1.43-py3-none-any.whl.metadata (13 kB)
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Collecting watchdog<6,>=2.1.5 (from streamlit)
  Downloading watchdog-5.0.3-py3-none-win_amd64.whl.metadata (41 kB)
     ----

In [1]:
!pip install streamlit



In [None]:
! streamlit hello

In [7]:
streamlit --version

NameError: name 'streamlit' is not defined

In [2]:
## Import Libraries
import numpy as np
import pandas as pd
import regex as re
import spacy
import streamlit as st

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.base import BaseEstimator, TransformerMixin
import joblib

The training data set is `sentiments.csv`, a dataset used in the NLP module.

In [3]:
# Read in the data
df = pd.read_csv('sentiments.csv')

In [4]:
type(df)

pandas.core.frame.DataFrame

In [25]:
df

Unnamed: 0,text,sentiment,source,short
0,Wow... Loved this place.,1,yelp,wow love place
1,Crust is not good.,0,yelp,Crust good
2,Not tasty and the texture was just nasty.,0,yelp,tasty texture nasty
3,Stopped by during the late May bank holiday of...,1,yelp,stop late bank holiday rick steve recommendati...
4,The selection on the menu was great and so wer...,1,yelp,selection menu great price
...,...,...,...,...
2995,The screen does get smudged easily because it ...,0,amazon,screen smudge easily touch ear face
2996,What a piece of junk.. I lose more calls on th...,0,amazon,piece junk lose call phone
2997,Item Does Not Match Picture.,0,amazon,item match picture
2998,The only thing that disappoint me is the infra...,0,amazon,thing disappoint infra red port irda


In [5]:
df.head()

Unnamed: 0,text,sentiment,source
0,Wow... Loved this place.,1,yelp
1,Crust is not good.,0,yelp
2,Not tasty and the texture was just nasty.,0,yelp
3,Stopped by during the late May bank holiday of...,1,yelp
4,The selection on the menu was great and so wer...,1,yelp


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   text       3000 non-null   object
 1   sentiment  3000 non-null   int64 
 2   source     3000 non-null   object
dtypes: int64(1), object(2)
memory usage: 70.4+ KB


In [10]:
df.info(verbose=True, show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   text       3000 non-null   object
 1   sentiment  3000 non-null   int64 
 2   source     3000 non-null   object
dtypes: int64(1), object(2)
memory usage: 70.4+ KB


In [9]:
df.describe()

Unnamed: 0,sentiment
count,3000.0
mean,0.5
std,0.500083
min,0.0
25%,0.0
50%,0.5
75%,1.0
max,1.0


In [11]:
df.isnull().sum()

text         0
sentiment    0
source       0
dtype: int64

Next we define a function to do some preprocessing.

In [12]:
def clean_text(text):
    # reduce multiple spaces and newlines to only one
    text = re.sub(r'(\s\s+|\n\n+)', r'\1', text)
    # remove double quotes
    text = re.sub(r'"', '', text)

    return text

In [27]:
# Define your clean_text function
def clean_text(text):
    # Print original text (optional for debugging)
    print("Original text:", text)
    
    # reduce multiple spaces and newlines to only one
    text = re.sub(r'(\s\s+|\n\n+)', r'\1', text)
    print("After reducing spaces/newlines:", text)  # Print intermediate text
    
    # remove double quotes
    text = re.sub(r'"', '', text)
    print("After removing double quotes:", text)  # Print final cleaned text
    
    return text

In [13]:
df['text'] = df['text'].apply(clean_text)

In [26]:
df['text']

0                                Wow... Loved this place.
1                                      Crust is not good.
2               Not tasty and the texture was just nasty.
3       Stopped by during the late May bank holiday of...
4       The selection on the menu was great and so wer...
                              ...                        
2995    The screen does get smudged easily because it ...
2996    What a piece of junk.. I lose more calls on th...
2997                         Item Does Not Match Picture.
2998    The only thing that disappoint me is the infra...
2999    You can not answer calls with the unit, never ...
Name: text, Length: 3000, dtype: object

The following NLP model is used for further preprocessing. The following steps are the same as used in Module 8.

In [14]:
import en_core_web_sm
nlp = en_core_web_sm.load()

In [16]:
def convert_text(text):
    sent = nlp(text)
    ents = {x.text: x for x in sent.ents}
    tokens = []
    for w in sent:
        if w.is_stop or w.is_punct:
            continue
        if w.text in ents:
            tokens.append(w.text)
        else:
            tokens.append(w.lemma_.lower())
    text = ' '.join(tokens)

    return text

In [17]:
df['short'] = df['text'].apply(convert_text)

In [18]:
# Features and Labels
X = df['short']
y = df['sentiment']

In [19]:
# split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 101)

In [20]:
classifier = LinearSVC()

In [21]:
# create a matrix of word counts from the text
# use TF-IDF
tfidf = TfidfVectorizer()
# do the actual counting
A = tfidf.fit_transform(X_train, y_train)

# train the classifier with the training data
classifier.fit(A.toarray(), y_train)

# do the transformation for the test data
# NOTE: use `transform()` instead of `fit_transform()`
B = tfidf.transform(X_test)

# make predictions based on the test data
predictions = classifier.predict(B)

# check the accuracy
print('Accuracy: %.4f' % accuracy_score(y_test, predictions))

Accuracy: 0.7733


We will not attempt to improve on the performance in this lab as we are more interested in how to deploy the model.

Next we create a pipeline to simplify the process of model creation. We first define a preprocessor class which applies the `clean_text` and `convert_text` functions defined earlier.

In [22]:
class preprocessor(TransformerMixin, BaseEstimator):

    def __init__(self):
        pass

    def fit(self, X, y=None):
        return self

    def transform(self, X):
         return X.apply(clean_text).apply(convert_text)

Next we combine the preprocessing, feature engineering and modelling steps into a single pipe.

In [28]:
pipe = make_pipeline(preprocessor(), tfidf, classifier)
pipe.fit(df['text'],df['sentiment'])

Original text: Wow... Loved this place.
After reducing spaces/newlines: Wow... Loved this place.
After removing double quotes: Wow... Loved this place.
Original text: Crust is not good.
After reducing spaces/newlines: Crust is not good.
After removing double quotes: Crust is not good.
Original text: Not tasty and the texture was just nasty.
After reducing spaces/newlines: Not tasty and the texture was just nasty.
After removing double quotes: Not tasty and the texture was just nasty.
Original text: Stopped by during the late May bank holiday off Rick Steve recommendation and loved it.
After reducing spaces/newlines: Stopped by during the late May bank holiday off Rick Steve recommendation and loved it.
After removing double quotes: Stopped by during the late May bank holiday off Rick Steve recommendation and loved it.
Original text: The selection on the menu was great and so were the prices.
After reducing spaces/newlines: The selection on the menu was great and so were the prices.
Aft

In [24]:
# Save the model
joblib.dump(pipe, 'model.joblib')

['model.joblib']

**Exercise**: test the resulting model on phrases of positive and negative sentiment.

In [37]:
pipe.predict(pd.Series("lightweight and works well"))[0]

Original text: lightweight and works well
After reducing spaces/newlines: lightweight and works well
After removing double quotes: lightweight and works well


1

In [35]:
# Positive examples
positive_phrases = [
    "I love this product, it's amazing!",
    "The movie was fantastic, I highly recommend it.",
    "Great customer service and fast delivery.",
    "This is the best experience I've ever had.",
    "The food at the restaurant was delicious."]

In [34]:
# Negative examples
negative_phrases = [
    "This is the worst product I've ever bought.",
    "The movie was terrible, don't waste your time.",
    "Poor customer service and slow shipping.",
    "I had a horrible experience and wouldn't go back.",
    "The food was cold and tasteless."]

In [42]:
# Custom test
custom_phrases = [
    "The product arrived on time but was damaged.",
    "Despite the rain, we had a great time at the park.",
    "The phone's battery life is disappointing, but the camera is excellent."
]

print("Custom phrases:")
for phrase in custom_phrases:
    print(f"Text: '{phrase}'")
    print(f"Predicted sentiment: {predict_sentiment(phrase)}\n")

Custom phrases:
Text: 'The product arrived on time but was damaged.'
Original text: The product arrived on time but was damaged.
After reducing spaces/newlines: The product arrived on time but was damaged.
After removing double quotes: The product arrived on time but was damaged.
Predicted sentiment: Negative

Text: 'Despite the rain, we had a great time at the park.'
Original text: Despite the rain, we had a great time at the park.
After reducing spaces/newlines: Despite the rain, we had a great time at the park.
After removing double quotes: Despite the rain, we had a great time at the park.
Predicted sentiment: Positive

Text: 'The phone's battery life is disappointing, but the camera is excellent.'
Original text: The phone's battery life is disappointing, but the camera is excellent.
After reducing spaces/newlines: The phone's battery life is disappointing, but the camera is excellent.
After removing double quotes: The phone's battery life is disappointing, but the camera is excell

In [39]:
# Function to predict sentiment
def predict_sentiment(text):
    prediction = model.predict(pd.Series(text))[0]
    return "Positive" if prediction == 1 else "Negative"

In [41]:
# Load the saved model
model = joblib.load('model.joblib')

Once satisfied that we have a model ready for deployment, we can write a self-contained script that creates the model and saves it as a joblib file. By doing so from a script rather than the notebook we simplify the process when deploying.

**Exercise**:
1. Review the code in model.py.
2. Open an Anaconda prompt (Windows) or Terminal window (Mac).
3. Navigate to the folder where model.py is located using the command `cd "<path to your folder>"`.
4. At the prompt enter `python model.py`. After a few moments this creates a file `model.joblib`.

Let us load this model and verify that it alone can be used to make predictions.

In [29]:
newpipe = joblib.load(open('model.joblib','rb'))

In [30]:
type(newpipe)

sklearn.pipeline.Pipeline

Testing this out:

In [31]:
print(newpipe.predict(pd.Series('awesome place'))[0])
print(newpipe.predict(pd.Series('terrible!'))[0])
print(newpipe.predict(pd.Series('very interesting'))[0])

Original text: awesome place
After reducing spaces/newlines: awesome place
After removing double quotes: awesome place
1
Original text: terrible!
After reducing spaces/newlines: terrible!
After removing double quotes: terrible!
0
Original text: very interesting
After reducing spaces/newlines: very interesting
After removing double quotes: very interesting
1


We can then write a self-contained script that loads the model and can make predictions on the fly. This is partially done for you in the file "app.py".

**Exercise**: Refer to app.py and fill in the missing code based on the code above using a text editor such as Spyder or even Jupyter. Observe how it links to utils.py which contains the preprocessing functions.

### Local hosting

Using Anaconda prompt (Windows) or a Terminal window (Mac) run "streamlit run app.py". This deploys the app locally on http://localhost:8501/ (or similar) which you can then view on the browser. The file app.py may require debugging before it runs successfully.

**Bonus Exercise**: Redesign the webpage by adding other components. You can use the cheat sheet at https://docs.streamlit.io/library/cheatsheet as a reference.

### Deployment via streamlit.io

So far you have deployed your model on your local machine. Now we seek to deploy it publicly.

streamlit.io is intended to deploy Streamlit apps seamlessly without worrying about infrastructure.

There is one additional file needed for external deployment of your model:
- requirements.txt includes the versions of packages that are to be used with the app.

To update the `requirements.txt` file use the `__version__` attribute to see the version of packages being used. This ensures that your model is reproducible on other computing environments.

In [None]:
joblib.__version__

In [None]:
st.__version__

Log into your GitHub account (create one if you have not already done so) and create a new repository containing the following files.

- requirements.txt
- app.py
- model.joblib
- utils.py
- .streamlit/ (folder containing config.toml)

This config.toml allows one to set themes such as the type of background to display.

Next sign up for a free account at https://streamlit.io/ and once signed in, go to https://share.streamlit.io/ and click the blue "New app" button. Under "Repository" specify the GitHub repository where your app is located (in the form username/reponame). The default URL is based on the app's location in GitHub, but that may be changed. Under "main file path" enter `app.py` replacing `streamlit-app.py`. Finally click the "Deploy!" button. If successful the app will deployed to the specified url (it may take several minutes). An example can be seen at https://iod-sentiment-app.streamlit.app/.

If there are issues, click on the "Manage app" button at the bottom right to view the app's logs. Files such as requirements.txt can be edited directly within GitHub.

Further details are at https://docs.streamlit.io/streamlit-community-cloud/manage-your-app.

If you managed to see your app successfully, congratulations! You now know how to deploy an app on the cloud. To make it visible to others, go to your app's settings and under "Sharing" -> "Who can view this app" select "This app is public and searchable".

### Deployment via render.com (optional)

Render is a general-purpose cloud platform that goes beyond deploying streamlit apps. It can host a broader range of web applications, APIs, databases and more.

Sign up for a free account at https://dashboard.render.com/register (a Platform As A Service) by connecting via your GitHub account.

Once signed into dashboard.render.com click "New Web Service" under Web Services.

From "connect GitHub account", select the repository containing the above files.

Choose a unique name for the web service and leave the root directory blank. Under "start command" enter `streamlit run app.py`.

Finally click "Create Web Service". It may take a few minutes to work but
upon seeing "Booting worker with pid:" go to the url specified. An example can be seen at https://streamlit-sentiment.onrender.com/


Note that if working in part of a larger software system it is good practice to have versioning of code (e.g. with GitHub) and also make use of CI/CD software.

### References

More information on pipelines:
- https://gist.github.com/amberjrivera/8c5c145516f5a2e894681e16a8095b5c
- https://scikit-learn.org/stable/modules/compose.html#pipeline

Deploying Streamlit apps on Render and Streamlit Cloud:
- https://www.youtube.com/watch?v=bXRVgg2iWyc



---



---



> > > > > > > > > © 2024 Institute of Data


---



---



