# Phase 0
This notebook is designed to test the UX and developer experience of an ML platform using a fraud detection use case. The goal is not to create the best-performing model but to evaluate how easy it is to work with the platform.

How This Notebook Works

* We generate synthetic fraud data and train a simple machine learning model.
* We deploy the model as an API using FastAPI.
* We send requests to the API to test predictions.
* This is designed to run in a cloud notebook (internal and external), so some setup is required for external access.

## Install Dependencies (if needed)
What This Cell Does:

* Installs the required tools for machine learning and API deployment.
* FastAPI & Uvicorn: Used to deploy and serve the model as an API.
* Scikit-learn: Used for training the fraud detection model.
* Joblib: Saves and loads the trained model.
*Nest-asyncio: Helps FastAPI run smoothly inside Jupyter notebooks.

⚠️ Skip this cell if all libraries are already installed.

In [None]:
# Install necessary packages (uncomment if running for the first time)
!pip install fastapi uvicorn scikit-learn pandas numpy joblib requests nest-asyncio

## Generate Synthetic Data and Train a Model
What This Cell Does:

* Creates synthetic fraud data (since we don’t have real fraud cases).
* Trains a simple model to classify transactions as fraudulent or not.
* Splits the data into training and test sets (80% for training, 20% for testing).
* Evaluates the model and prints a report.

📌 Key Concept:
The model is very basic. It is only used to test how easy it is to use the platform.

In [2]:
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import joblib

# Create synthetic fraud detection data:
# We'll generate 1000 samples with 20 features, and make the classes imbalanced (e.g., fraud is rare)
X, y = make_classification(n_samples=1000,
                           n_features=20,
                           n_informative=5,
                           n_redundant=2,
                           n_clusters_per_class=1,
                           weights=[0.95, 0.05],
                           random_state=42)

# Convert to a DataFrame for a more realistic data handling scenario.
df = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(20)])
df['label'] = y

# Split the data
X_train, X_test, y_train, y_test = train_test_split(df.drop("label", axis=1), df["label"], test_size=0.2, random_state=42)

# Train a simple Logistic Regression model
model = LogisticRegression(solver="liblinear")
model.fit(X_train, y_train)

# Evaluate the model on the test set
y_pred = model.predict(X_test)
print("Classification Report:\n", classification_report(y_test, y_pred))

Classification Report:
               precision    recall  f1-score   support

           0       0.99      1.00      0.99       187
           1       1.00      0.85      0.92        13

    accuracy                           0.99       200
   macro avg       0.99      0.92      0.96       200
weighted avg       0.99      0.99      0.99       200



## Save the Trained Model Locally
What This Cell Does:
* Saves the trained model as a file (fraud_model.joblib).
* This file will be loaded later to serve predictions through the API.

In [3]:
# Save the model to a file for later loading in the API.
model_filename = "fraud_model.joblib"
joblib.dump(model, model_filename)
print(f"Model saved to {model_filename}")

Model saved to fraud_model.joblib


## Create a FastAPI Application
What This Cell Does:

* Creates a FastAPI web service (app.py) that:
    * Loads the trained model.
    * Defines an API endpoint (/predict) to classify transactions.
    * Accepts a list of 20 numbers (features) as input.
    * Returns a prediction and probability score.

📌 Key Concept:
The API allows other applications (or users) to send transaction data and get predictions in real-time.

In [4]:
%%writefile app.py
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np
import pandas as pd

app = FastAPI()

# Load the trained model
try:
    model = joblib.load("fraud_model.joblib")
except FileNotFoundError:
    model = None

feature_names = [f'feature_{i}' for i in range(20)]

class Transaction(BaseModel):
    features: list[float]

@app.post("/predict")
def predict(transaction: Transaction):
    if model is None:
        return {"error": "Model not loaded"}
    features = np.array(transaction.features).reshape(1, -1)
    features_df = pd.DataFrame(features, columns=feature_names)
    prediction = model.predict(features_df)
    probability = model.predict_proba(features_df).max()
    return {"prediction": int(prediction[0]), "probability": probability}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="127.0.0.1", port=8000, log_level="info")

Writing app.py


## Run the FastAPI Server Inside the Notebook
What This Cell Does:

* Starts the FastAPI server so it can receive requests.
* Runs in the background so the notebook can continue running.
* Exposes the service on 127.0.0.1:8080.

In [5]:
import nest_asyncio
import uvicorn
from threading import Thread

# Apply nest_asyncio to allow running FastAPI inside Jupyter
nest_asyncio.apply()

# Ensure Uvicorn does NOT use uvloop
def run_api():
    uvicorn.run("app:app", host="127.0.0.1", port=8000, log_level="info", loop="asyncio")

# Start FastAPI in a background thread
api_thread = Thread(target=run_api, daemon=True)
api_thread.start()

print("FastAPI server is running at http://127.0.0.1:8000")

FastAPI server is running at http://127.0.0.1:8000


INFO:     Started server process [9959]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)


INFO:     127.0.0.1:56492 - "POST /predict HTTP/1.1" 200 OK
INFO:     127.0.0.1:57168 - "POST /predict HTTP/1.1" 200 OK


## Test the FastAPI Endpoint

What This Cell Does:

* Sends a test request to the /predict API endpoint.
* Generates a random transaction with 20 numbers (features).
* Receives a prediction (fraud or not fraud).

📌 Key Concept:
This simulates how a real app or website would use the model to check if a transaction is fraudulent.

In [7]:
import requests
import json
import numpy as np

# Create a sample payload.
# Ensure the feature vector length matches the model's expectation (20 features).
sample_features = np.random.randn(20).tolist()
payload = {"features": sample_features}

# Send a POST request to the /predict endpoint
response = requests.post("http://127.0.0.1:8000/predict", json=payload)

# Print the response from the API
print("Response from FastAPI endpoint:")
print(response.json())

Response from FastAPI endpoint:
{'prediction': 0, 'probability': 0.9433665721613025}
