(**Click the icon below to open this notebook in Colab**)

[![Open InColab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/xiangshiyin/machine-learning-for-actuarial-science/blob/main/2025-spring/week14/notebook/demo.ipynb)

# Model Serving

## `Pickle`
Serialize a model for later usage

### Example 1 - Save a Python object to local storage

In [61]:
import pickle

data = {
    'name': 'John Doe',
    'age': 30,
    'city': 'New York',
    'occupation': 'Software Engineer'
}

# Save this dictionary to a file
with open('data.pickle', 'wb') as f:
    pickle.dump(data, f)

print("Data saved to file data.pickle.")

Data saved to file data.pickle.


In [62]:
# Load the dictionary back from the pickle file
with open('data.pickle', 'rb') as f:
    data2 = pickle.load(f)
print("Dictionary loaded successfully!")

Dictionary loaded successfully!


In [63]:
data2

{'name': 'John Doe',
 'age': 30,
 'city': 'New York',
 'occupation': 'Software Engineer'}

In [64]:
data2 == data

True

In [65]:
id(data2), id(data)

(4748575488, 4742007936)

### Example 2 - Save a ML model

In [66]:
import pickle
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Dataset loaded and split successfully!")

Dataset loaded and split successfully!


In [67]:
# Train a RandomForest classifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

print("Model trained successfully!")

Model trained successfully!


In [68]:
# Save the trained model to a file
with open("iris_model.pkl", "wb") as file:
    pickle.dump(model, file)

print("Model saved successfully as 'iris_model.pkl'!")

Model saved successfully as 'iris_model.pkl'!


In [69]:
# Predict with the in-memory model
from sklearn.metrics import accuracy_score

y_pred = model.predict(X_test)
# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)
# Print the accuracy
print("Accuracy:", accuracy)


Accuracy: 1.0


In [70]:
y_pred[:20]

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2])

In [71]:
y_test[:20]

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2])

In [72]:
# Predict with the saved model
with open('iris_model.pkl', 'rb') as f:
    model2 = pickle.load(f)

y_pred2 = model2.predict(X_test)
accuracy2 = accuracy_score(y_test, y_pred2)
print("Accuracy:", accuracy2)

Accuracy: 1.0


In [73]:
y_pred2[:20]

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2])

In [74]:
y_test[:20]

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2])

In [75]:
# Single data point (sepal length, sepal width, petal length, petal width)
single_data_point = [[5.1, 3.5, 1.4, 0.2]]

# Predict class for this single data point
single_prediction = model.predict(single_data_point)

# Get class name from iris dataset
predicted_class = iris.target_names[single_prediction[0]]

print(f"Predicted Class for the Single Data Point: {predicted_class}")


Predicted Class for the Single Data Point: setosa


In [76]:
# Predict class for this single data point
single_prediction2 = model.predict(single_data_point)

# Get class name from iris dataset
predicted_class2 = iris.target_names[single_prediction2[0]]

print(f"Predicted Class for the Single Data Point: {predicted_class2}")

Predicted Class for the Single Data Point: setosa


In [77]:
single_prediction

array([0])

In [78]:
iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

## `FastAPI`

In [None]:
from fastapi import FastAPI
from pydantic import BaseModel
import pickle
import numpy as np

app = FastAPI()

# Load trained model
with open("iris_model.pkl", "rb") as file:
    model = pickle.load(file)


# Define the expected request format
class FeaturesInput(BaseModel):
    features: list[float]


@app.post("/predict")
def predict(data: FeaturesInput):
    prediction = model.predict([data.features])
    return {"prediction": int(prediction[0])}


Run command `uvicorn fastapi_demo:app --host 0.0.0.0 --port 8000 --reload` to expose the API endpoints.
- Or run `fastapi dev fastapi_demo.py --reload`
Run `curl` command below to test the API
```
curl -X POST "http://127.0.0.1:8000/predict" \
      -H "Content-Type: application/json" \
      -d '{"features": [5.1, 3.5, 1.4, 0.2]}'
```

## `Streamlit` example

In [None]:
import streamlit as st
import pickle
import numpy as np

# Load model
with open("iris_model.pkl", "rb") as file:
    model = pickle.load(file)

st.title("Iris Flower Classifier")

# User input
sepal_length = st.number_input("Sepal Length")
sepal_width = st.number_input("Sepal Width")
petal_length = st.number_input("Petal Length")
petal_width = st.number_input("Petal Width")

if st.button("Predict"):
    features = np.array([[sepal_length, sepal_width, petal_length, petal_width]])
    prediction = model.predict(features)[0]
    st.write(f"Predicted Class: {prediction}")


Run `streamlit run streamlit_demo.py`

# Introduction to NLP

## Preprocessing

In [19]:
import nltk

nltk.download('stopwords')
nltk.download('punkt')
nltk.download('punkt_tab')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/xiangshiyin/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /Users/xiangshiyin/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     /Users/xiangshiyin/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [None]:
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords


data = "This is a simple example to demonstrate removing stopwords using NLTK."
stopWords = set(stopwords.words('english'))

In [25]:
len(stopWords)

198

In [26]:
stopwords.words('english')[:10]

['a', 'about', 'above', 'after', 'again', 'against', 'ain', 'all', 'am', 'an']

In [31]:
tokenized_data = word_tokenize(data)

In [33]:
print(f"Original text: {data}")
print(f"Tokenized text: {"|".join(tokenized_data)}")

Original text: This is a simple example to demonstrate removing stopwords using NLTK.
Tokenized text: This|is|a|simple|example|to|demonstrate|removing|stopwords|using|NLTK|.


In [35]:
filtered_tokenized_data = [
    w
    for w in tokenized_data
    if w not in stopWords
]
print(f"After removing stopwords: {filtered_tokenized_data}")

After removing stopwords: ['This', 'simple', 'example', 'demonstrate', 'removing', 'stopwords', 'using', 'NLTK', '.']


In [36]:
print(f"Original text: {data}")
print(f"Tokenized text: {"|".join(tokenized_data)}")
print(f"After removing stopwords: {"|".join(filtered_tokenized_data)}")

Original text: This is a simple example to demonstrate removing stopwords using NLTK.
Tokenized text: This|is|a|simple|example|to|demonstrate|removing|stopwords|using|NLTK|.
After removing stopwords: This|simple|example|demonstrate|removing|stopwords|using|NLTK|.
