<font color="red"><h1><b><u>MAKE A COPY OF THIS NOTEBOOK SO YOUR EDITS ARE SAVED</u></b></h1></font>

---
---
# 🌍 ***NLP For Finance: Deploying Your Model***

<center> <img src="https://www.johnsnowlabs.com/wp-content/uploads/2023/03/Examining-The-Impact-Of-NLP_1.jpg" alt="drawing" width="800"/>


In [1]:
#@title **🏗 Setup Cell** {"display-mode":"form", "form-width":"25%"}
#@markdown **Run this to import libraries and download data!**


# Installing Streamlit & pyngrok
!pip install streamlit -q
!pip install pyngrok -q
!pip install transformers -q

from pyngrok import ngrok
import streamlit as st

import matplotlib.pyplot as plt
%matplotlib inline
from sklearn import metrics
import seaborn as sns
import pandas as pd
import numpy as np

from transformers import BertTokenizer, BertForSequenceClassification
from keras.utils import pad_sequences
import torch

!wget -q 'https://storage.googleapis.com/inspirit-ai-data-bucket-1/Data/AI%20Scholars/Sessions%206%20-%2010%20(Projects)/Project%20-%20NLP%2BFinance/finance_test.csv'
!wget -q 'https://storage.googleapis.com/inspirit-ai-data-bucket-1/Data/AI%20Scholars/Sessions%206%20-%2010%20(Projects)/Project%20-%20NLP%2BFinance/finance_train.csv'
print ("Train and Test Files Loaded as train.csv and test.csv")

def get_finance_train():
  df_train = pd.read_csv("finance_train.csv")
  return df_train

def get_finance_test():
  df_test = pd.read_csv("finance_test.csv")
  return df_test

def launch_website():
  try:
    if ngrok.get_tunnels():
      ngrok.kill()
    tunnel = ngrok.connect()

    print("Click this link to try your web app:")
    print(tunnel.public_url)

    !streamlit run --server.port 80 app.py >/dev/null # Connect to the URL through Port 80 (>/dev/null hides outputs)

  except KeyboardInterrupt:
    ngrok.kill()

def plot_confusion_matrix(y_true,y_predicted):
  cm = metrics.confusion_matrix(y_true, y_predicted)
  print ("Plotting the Confusion Matrix")
  labels = ["Negative","Neutral","Positive"]
  df_cm = pd.DataFrame(cm,index =labels,columns = labels)
  fig = plt.figure(figsize=(7,6))
  res = sns.heatmap(df_cm, annot=True,cmap='Blues', fmt='g')
  plt.yticks([0.5,1.5,2.5], labels,va='center')
  plt.title('Confusion Matrix - TestData')
  plt.ylabel('True label')
  plt.xlabel('Predicted label')
  plt.show()
  plt.close()


Train and Test Files Loaded as train.csv and test.csv


## Table of Contents

You can find a more detailed Table of Contents by clicking on the icon on the left sidebar that looks like this: <img src="https://storage.googleapis.com/inspirit-ai-data-bucket-1/Data/AI%20Scholars/Sessions%201%20-%205/Session%201a%20-%20AI%20Fundamentals/table_of_contents_icon.png" width=20>.


>[🛜 Milestone 1: Streamlit](#scrollTo=iqp5tjXRBBHl)

>>[1.1. ngrok](#scrollTo=y0lOuFkdCtAd)

>>[1.2. Launching An Example Website](#scrollTo=pFMqHmWKCyGR)

>[🛠 Milestone 2: Building Our Web App](#scrollTo=lFqyS-fGC6wo)

>>[2.1. Starting Our app.py File with Our Model](#scrollTo=L7vPClO8lVdS)

>>[2.2. Handling Data Processing](#scrollTo=BQCF3f2RYiMn)

>>[2.3. Building Our Web Interface](#scrollTo=VHbaSdywtkaa)

>>[(Optional) 2.4. Visualizations](#scrollTo=fetgQOoixpFI)

>[🎁 Wrapping Up](#scrollTo=mACzv6SLdqjm)



---
---
# 🛜 **Milestone 1: Streamlit**


<center>
<img src="https://images.datacamp.com/image/upload/v1640050215/image27_frqkzv.png" alt="Streamlit" width="250" height="150">
</center>

Today, we'll be using [Streamlit](https://docs.streamlit.io)! Unlike traditional web development which often requires an understanding of multiple languages like HTML, CSS, and JavaScript, Streamlit simplifies this process by offering a Python-centric approach. With just a few lines of code, you can quickly build and format a website

Take a moment to look through examples of websites built with Streamlit [here](https://streamlit.io/gallery?category=favorites). As a class, choose your favorite and answer the following **questions:**
* Who is this application for?
* How does the user input data - are these intuitive ways of interacting with the app?
* What does the application do with the data?
* Evaluate the ease of use and look of the application.

Now that we've seen what is possible with Streamlit, let's try to deploy our **NLP for Finance model** to the web!

## **1.1. ngrok**

Before we start deploying our website, we need to set up a few things. First, we need to set up **tunneling.** Tunneling is a technique that allows you to expose your local server to the public internet. This is especially useful for web applications. We'll use <font color="#0e86d4"><b>`pyngrok`</b></font> for this.

<font color=SlateGrey><h4><b>
Use [these](https://drive.google.com/file/d/12zwuOuKh91VSHIHS-6S4ADF4HLC2wKJq/view?usp=sharing) instructions to create a ngrok account and get your authtoken!
</b></h4></font>

<font color=DarkGray><h4><b>
Paste your authtoken below next to `!ngrok authtoken`!
</b></h4></font>

Make sure to input your authtoken in quotation marks! For instance:
```python
!ngrok authtoken "YOURAUTHTOKEN1920131248302430409"
```

In [2]:
!ngrok authtoken "31r9am2PA3poqnxaP7kQ6Vai8B4_hA3rFFphESLCKSbL2g1C"

Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


This authentication token will configure Streamlit to push our local site to the web through ngrok!

## **1.2. Launching An Example Website**

To deploy our web app, we will need to have all of the relevant code for the app in a Python file we'll call <font color="#0e86d4"><b>`app.py`</b></font>.

In Google Colab, we can write to a new file using the **`%%writefile`** command at the top of the code cell. This basically takes the text in the rest of the code cell and writes it to <font color="#0e86d4"><b>`app.py`</b></font>, overwriting what might already be in that file, or creating a new file if it doesn't already exist.

In [3]:
%%writefile app.py

import streamlit as st  # Importing Streamlit, since app.py is the only code our app sees
st.text("Hello world!") # st.text() is like print(), except it prints on our website instead!

Overwriting app.py


You may notice that this doesn't output anything other than the fact that it's written to the <font color="#0e86d4"><b>`app.py`</b></font> file. This is because we've created a file but haven't actually run it yet. You can check out the newly created file by clicking the folder icon 📁 on the left sidebar.


To run our <font color="#0e86d4"><b>`app.py`</b></font> file, we can use our <font color="#0e86d4"><b>`launch_website()`</b></font> function, which takes that code and deploys it to a web app using Streamlit and ngrok.

Run the cell below, and click on the link that it generates to visit your website! You can click on the "Visit Site" button after opening the link to see your site.

>NOTE: Every time you're done looking at the website you're making, be sure to stop the `launch_website()` cell so you can run other code in this notebook! You can do so by clicking the buffering stop button ⏹ on the cell.


In [4]:
launch_website()

Click this link to try your web app:
https://2e55458a1220.ngrok-free.app

Aborted!


✅ <font color=#00ab41><b>
Congratulations, you have your first working site!
</b></font>


---
---
# 🛠 **Milestone 2: Building Our Web App**

Next, let's start building an interface that our users can interact with! One of the ways users can work with our model is by giving them a way to input example tweets or news statements about specific stocks that the model can classify as either positive or negative. To do this, we need to do a few things:

1. Load the pre-trained BERT model and tokenizer.
2. Set up the Streamlit UI to allow users to input text.
3. Process user input using the tokenizer.
4. Predict using the model and classify the sentiment.
5. Display the results to the user.

## **2.1. Starting Our <font color="#0e86d4"><b>`app.py`</b></font> File with Our Model**

Let's reload and take a look at the model and data we've built from our project so far. We've re-imported our data from before in the setup cell.

In [6]:
#@title Run this cell to load your saved model! {"display-mode":"form", "form-width":"25%"}
import os
import shutil
import zipfile
from google.colab import drive, files
from transformers import BertTokenizer, BertForSequenceClassification
from IPython.display import Markdown

#@markdown If you downloaded the file to your computer instead of Drive, uncheck the box and follow the upload prompt.
load_from_Google_Drive = True  # @param {"type":"boolean"}
folder_name = "NLP_For_Finance_BERT_Model"

if load_from_Google_Drive:
  drive.mount('/content/drive')
  zip_path = f'/content/drive/My Drive/{folder_name}.zip'
else:
  display(Markdown("""
Please upload your model zip file using the **Choose Files** button below!

NOTE: If your files aren't uploading correctly, you can also upload files using the file menu:
1. 👈🏼 Click the folder icon 📁 on the left sidebar.
2. Click the upload button that looks like a page with an up arrow in it (📄⬆) and upload your file!
---
"""))
  uploaded = files.upload()
  zip_path = list(uploaded.keys())[0]  # Get the uploaded filename


# Unzip the file
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
  zip_ref.extractall(folder_name)

if load_from_Google_Drive:
  drive.flush_and_unmount()

print("✅ Model and tokenizer loaded!")


Mounted at /content/drive
✅ Model and tokenizer loaded!


Below is a code cell with the outline for your app! We'll go through each step in subsequent exercises so that you can fill in each piece.

We suggest going into split-screen mode to edit your <font color="#0e86d4"><b>`app.py`</b></font> code. You can do this by:

1. Clicking the code cell below
2. Clicking the symbol in the top right of the code cell with two squares and an arrow pointing to the top right, like these symbols: ⧉↗

>NOTE: Make sure you run the cell again every time you make an edit, which you should be able to do with the play button in the top left of the new window.


We've already handled the first step for you! Remember, this is the only code that your app will see, so we have to include *everything*, even the import of libraries we might've already imported in this notebook.

All you'd potentially need to change is the `model_path` variable, if you changed the name of your model file to something other than the default name `"NLP_For_Finance_BERT_Model.zip"` from the previous notebook.

In [7]:
%%writefile app.py
# ↖️ Run this cell every time you make any changes!

############## STEP 1: Import libraries, load BERT model/tokenizer #############

# Imports
import streamlit as st
from transformers import BertTokenizer, BertForSequenceClassification
import torch
from keras.preprocessing.sequence import pad_sequences
import numpy as np
import plotly.express as px
import os

@st.cache_resource  # Cache to load model/tokenizer only once per session
def load_model_and_tokenizer():

    # Local model path (if available)
    local_path = "NLP_For_Finance_BERT_Model"
    # Hugging Face Hub path (replace with your username/repo)
    hub_path = "username/my-finance-bert"

    if os.path.isdir(local_path):
        model_path = local_path
        st.info("Loading model from local folder...")
    else:
        model_path = hub_path
        st.info("Loading model from Hugging Face Hub...")

    # Load model + tokenizer
    model = BertForSequenceClassification.from_pretrained(model_path)
    tokenizer = BertTokenizer.from_pretrained(model_path)

    # Set model to evaluation mode, move to GPU if available
    model.eval()
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.to(device)

    return model, tokenizer, device

# Call the function to get cached model, tokenizer, and device
model, tokenizer, device = load_model_and_tokenizer()


########################### STEP 2: Helper Functions ###########################

### Preprocessing Function
def prepare_input(text):
    # Add special tokens
    sentence_with_tokens = "[CLS] " + text + " [SEP]"

    # Tokenize sentence
    tokenized_text = tokenizer.tokenize(sentence_with_tokens)

    # Convert tokens to IDs
    input_ids = tokenizer.convert_tokens_to_ids(tokenized_text)

    # Pad the input IDs
    input_ids = pad_sequences([input_ids],
                              maxlen=128,
                              dtype="long",
                              truncating="post",
                              padding="post")[0]

    # Create attention masks
    attention_mask = [float(i > 0) for i in input_ids]
    return torch.tensor([input_ids]), torch.tensor([attention_mask])


### Prediction Function
def predict(text):
    # Use our processing function on the input text
    input_ids, attention_mask = prepare_input(text)

    # Pass the processed data to the model
    with torch.no_grad():
        outputs = model(input_ids.to(device), token_type_ids=None, attention_mask=attention_mask.to(device))

    # Convert the output logits to probabilities and return!
    logits = outputs[0]
    probabilities = torch.nn.functional.softmax(logits, dim=1).cpu().numpy().flatten()
    return probabilities


########################## STEP 3: Streamlit interface #########################

### Interface
st.title('Sentiment Analysis with BERT')
text = st.text_area("Enter the text to analyze below!")
button = st.button("Submit")

### Button logic
if button:
    if text:
        # Get Model's Prediction
        probabilities = predict(text)

        # Pair the labels with the probabilities in a dictionary
        labels = ['Negative', 'Neutral', 'Positive']

        # Print out the results on the Streamlit site
        st.write("### Sentiment Probabilities:")
        for label, prob in zip(labels, probabilities):
            st.write(f"{label}: {prob:.4f}")

        # (Optional) Pie Chart Visualization
        fig = px.pie(values=probabilities, names=labels, title="Sentiment Distribution")
        st.plotly_chart(fig)

    else:
        st.error("Please enter a text to analyze.")


Overwriting app.py


## **2.2. Handling Data Processing**

### 2.2.1. Coding Exercise: Adding the Data Processing Helper Function


Next, we need to preprocess our data. In the code cell below, there's a starter template for the function we'll use to process the data, similar to what we did in Notebook 3!

Copy the template code below into your <font color="#0e86d4"><b>`app.py`</b></font> file under <font color="green">`### Preprocessing Function`</font> and replace the <font color="#0e86d4"><b>`None`</b></font> values! Don't edit the template code below, so you can start with a fresh template in case you make mistakes.

*Hint*: Feel free to reference your code in Notebook 3!

In [None]:
def prepare_input(text):
  ### YOUR CODE HERE: Add special tokens on either end of the input `text`
  sentence_with_tokens = None
  ### END CODE HERE

  # Tokenize sentence
  tokenized_text = tokenizer.tokenize(sentence_with_tokens)

  # Convert tokens to IDs
  input_ids = tokenizer.convert_tokens_to_ids(tokenized_text)

  # Pad the input IDs
  input_ids = pad_sequences([input_ids],
                              maxlen=128,
                              dtype="long",
                              truncating="post",
                              padding="post")[0]

  ### YOUR CODE HERE: Create attention masks
  attention_mask = None
  ### END CODE HERE

  return torch.tensor([input_ids]), torch.tensor([attention_mask])

### 2.2.2. Coding Exercise: Adding the Prediction Helper Function


Now that our function for formatting our data is set up properly, we can finally pass text into our model for predictions. Since the prediction logic was handled for you in Notebook 3, we've handled the logic again for you below.

All you need to do is paste this into your <font color="#0e86d4"><b>`app.py`</b></font> file under <font color="green">`### Prediction Function`</font>! Feel free to read through the code below if you're curious.

In [None]:
def predict(text):
  # Use our processing function on the input text
  input_ids, attention_mask = prepare_input(text)

  # Pass the processed data to the model (torch.no_grad() disables computing of
  # gradients, which are only useful when training)
  with torch.no_grad():
      outputs = model(input_ids.to(device),
                      token_type_ids=None,
                      attention_mask=attention_mask.to(device))

  # Conver the output logits to probabilities and return!
  logits = outputs[0]
  probabilities = torch.nn.functional.softmax(logits, dim=1).cpu().numpy().flatten()
  return probabilities

## **2.3. Building Our Web Interface**

How will we make our app interactive for users? Let's allow them to input example text, and we'll pass it to our model for classification. The code cell below contains an example of how you can ask the user for text and allow them to submit it.

Copy this code under <font color="green">`### Interface`</font> in your <font color="#0e86d4"><b>`app.py`</b></font>, and feel free to make any changes to the title and messages displayed to the user! Make sure you run your <font color="#0e86d4"><b>`app.py`</b></font> cell again to overwrite the file with these changes.

In [8]:
# Title of webpage
st.title('Finance Stock Predictor!!')

# Gets text from the user
text = st.text_area("Enter the text to analyze below!")

# Displays a button; we'll add some logic later for when the button is clicked
button = st.button("Submit")

2025-08-28 20:18:44.396 
  command:

    streamlit run /usr/local/lib/python3.12/dist-packages/colab_kernel_launcher.py [ARGUMENTS]
2025-08-28 20:18:44.403 Session state does not function when running a script without `streamlit run`


Now that we have something that will show up on the website, let's try it out! Remember, *the button won't do anything* since we don't have any code to handle that just yet.

Run the cell below to get your app's link, and stop the cell when you're done checking out your site!

In [None]:
launch_website()

Click this link to try your web app:
https://a2242e593bef.ngrok-free.app
2025-08-28 20:18:59.719994: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1756412339.770274   30779 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1756412339.784596   30779 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1756412339.824756   30779 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1756412339.824813   30779 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:17

### 2.3.1. Coding Exercise: Adding Button Logic

Let's add some logic so we can get a prediction from our model when the button's pressed! A basic template for what the logic will look like is below.

Technically, every time a user interacts with the site, your entire <font color="#0e86d4"><b>`app.py`</b></font> file is run again. If they've clicked the button, `button` will be set to `True`, allowing the code in that `if` statement to run.

```python
if button: # (if button is pressed)
    if text: # (if there exists text)
        # Do something with that text
    else:
        st.error("Please enter a text to analyze.") # Otherwise throw an error
```

Copy the template below under <font color="green">`### Button Logic`</font> in your <font color="#0e86d4"><b>`app.py`</b></font>, and complete the code to use your predict function!

In [None]:
if button:
  if text:
    ### Get Model's Prediction

    # YOUR CODE HERE: Replace the None with the model's prediction!
    probabilities = predict(text)
    ### END CODE HERE

    # Pair the labels with the probabilities in a dictionary
    labels = ['Negative', 'Neutral', 'Positive']

    # Print out the results on the Streamlit site
    st.write("Sentiment Probabilities:")
    for label, prob in zip(labels, probabilities):
      st.write(f"{label}: {prob:.4f}")
  else:
    st.error("Please enter a text to analyze.")

Now, rerun your <font color="#0e86d4"><b>`app.py`</b></font> file cell and test out your new website. What happens when you click the button now?

In [None]:
launch_website()

NameError: name 'launch_website' is not defined

## (Optional) **2.4. Visualizations**

>Your website should be complete, but feel free to explore this section to add an example visualization!  

Our website will look much better with some visualizations of our data and our model's performance. We can create these using <font color="#0e86d4"><b>Plotly</b></font> - a library similar to Matplotlib and Seaborn that works very well with Streamlit! Let's import it:

In [None]:
import plotly.express as px

First, we need to set labels to our sentiments. Recall that:

* <font color=#FF474C>
Negative is 0
</font>
* <font color=#636363>
Neutral is 1
</font>
* <font color=#00ab41>
Positive is 2
</font>

Run the code cell below to add a column to the dataset with these labels rather than the hard-to-understand numbers.

In [None]:
# Define label dictionary
sentiment_label_map = {0: 'Negative', 1: 'Neutral', 2: 'Positive'}

# Load in data and add more descriptive column
df_train = get_finance_train()
df_train['Sentiment'] = df_train['Label'].map(sentiment_label_map)

### **Pie Chart**

One of the easiest visualizations we can do is a pie chart. Let's create a pie chart to visualize the distribution of our training data.

To do this, we can use the method `.value_counts()` on the `'Sentiment'` column of the dataset, as you can see below. Run the cell to see the results!

In [None]:
# Count occurences of each sentiment
sentiment_counts = df_train['Sentiment'].value_counts()
sentiment_counts

We'll also of course need some colors for the pie chart too! Below we've chosen a few colors for you, but you're more than welcome to change these values. We suggest using [Google's color picker](https://g.co/kgs/HwqhmCd) to explore different colors! You can find the code under HEX.

In [None]:
# Labels/colors - try changing these and looking up HEX values to customize the colors of your plot!
color_map = {'Negative' : '#FF0000',
             'Neutral'  : '#999999',
             'Positive' : '#00FF00'}

### 2.4.1. Coding Exercise


Now, we can use these to pass parameters into our <font color="#0e86d4"><b>`px.pie()`</b></font> function to build our plot.
Fill in the <font color="#0e86d4"><b>`None`</b></font> values with what we've found so far!

Here's some more information on what each piece of the pie does. If you're curious, you can also take a look at the [documentation](https://plotly.com/python-api-reference/generated/plotly.express.pie) for more info.

* <font color="#0e86d4"><b>`names`</b></font>: the labels of the pie chart slices
* <font color="#0e86d4"><b>`values`</b></font>: the number values we want to base the pie chart off of, in the same order as <font color="#0e86d4"><b>`names`</b></font>
* <font color="#0e86d4"><b>`title`</b></font>: the title of our chart
* <font color="#0e86d4"><b>`color`</b></font>: this is what the chart uses to know what the labels are that we're assigning colors to
* <font color="#0e86d4"><b>`color_discrete_map`</b></font> = the mapping of labels to colors we are using for our pie chart

*Hint:*
<details><summary>click to reveal!</summary>


>Remember when we got the value counts of the sentiments in the previous section? We can use that info for our pie chart! Here's how you'd retrieve that information; make sure you place these in the correct field below
* Use <font color="#0e86d4"><b>`sentiment_counts.index`</b></font> to get the index values from the first column (the one in bold)
* Use <font color="#0e86d4"><b>`sentiment_counts.values`</b></font> to get the values in the second column


In [None]:
fig = px.pie(
    names=None,
    values=None,
    title=None,
    color=None,
    color_discrete_map=None
)
fig.show()

### 2.4.2. Coding Exercise


Try adding a pie chart to your <font color="#0e86d4"><b>`app.py`</b></font> file now! Instead of displaying the information about the dataset, let's generate a pie chart that visualizes the probabilities that the model predicts so that it's easier to compare those visually.

The code should be very similar to what we've written above, except using the labels and probabilities we've coded in the file! We'll also need to replace `fig.show()` with `st.plotly_chart(fig)` so it displays on our Streamlit site rather than here in this coding notebook.

Once you have that code in your file, rerun it and run the cell below to test it out!

In [None]:
launch_website()

Click this link to try your web app:
https://0dcee31304a7.ngrok-free.app
2025-08-27 05:11:47.169524: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1756271507.201252    2034 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1756271507.210381    2034 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1756271507.233889    2034 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1756271507.233942    2034 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:17

---
---
# 🎁 **Wrapping Up**



You should have a nice web app to show off your project now! Feel free to continue experimenting with more visualizations from your project or more bells and whistles on the website. The [Streamlit cheat sheet](https://docs.streamlit.io/develop/quick-reference/cheat-sheet) is a great resource to get you started!

