# 💰  Deploying an Electronics Price Prediction Model with Modal

This notebook deploys a fine-tuned **LLaMA 3.1** model to Modal as a scalable cloud function.  
It uses efficient 4-bit quantization (QLoRA) to serve product **price predictions** based on textual descriptions.

You will:
- Authenticate securely with Modal and Hugging Face
- Define and deploy a `price()` function using `modal.App`
- Load the fine-tuned model from Hugging Face
- Query the model remotely to estimate prices for product descriptions

> 💡 Perfect for creating lightweight, serverless ML-powered services!


#### 📦 Install the Modal SDK quietly (for deploying and calling cloud functions)

In [None]:
!pip install -q modal

In [None]:
import modal # 🔌 Import the core Modal package for interacting with cloud functions
from modal import App, Image # 🛠️ Import specific utilities to define apps and images
from google.colab import userdata # 🔐 Import Colab's secure secret store for accessing user tokens safely
import os

#### 🔐 Retrieve Modal credentials securely from Colab's secret storage


In [None]:
modal_token_id = userdata.get("MODAL_TOKEN_ID")
modal_token_secret = userdata.get("MODAL_TOKEN_SECRET")

# Set environment variables for Modal CLI to use
os.environ["MODAL_TOKEN_ID"] = modal_token_id
os.environ["MODAL_TOKEN_SECRET"] = modal_token_secret

### 🔐 Authenticate with Modal

Before running the deployment or calling your Modal function, make sure you’ve added your `MODAL_TOKEN_ID` and `MODAL_TOKEN_SECRET` as notebook secrets.

> 💡 You can do this by clicking the 🔑 key icon on the left panel in Google Colab and adding:
> - `MODAL_TOKEN_ID`: your Modal token ID (starts with `ak-`)
> - `MODAL_TOKEN_SECRET`: your Modal secret key (starts with `sk-`)

Then, run the following command in a code cell to log in:


In [None]:
!modal token set --token-id {modal_token_id} --token-secret {modal_token_secret}

### 📦 Define and Save Modal Function: Price Prediction with LLaMA 3.1 fine tuned

This cell creates a Python script (`pricer_electronics_modal_app.py`) that defines a **Modal app** for predicting product prices using a fine-tuned LLaMA 3.1 model.

#### ✅ What it does:
- Creates a custom Modal container and installs all required libraries.
- Downloads and loads both the base **Meta LLaMA 3.1** model and the fine-tuned weights from Hugging Face.
- Applies **QLoRA 4-bit quantization** for faster and more memory-efficient inference on GPU.
- Defines a `price()` method that accepts a product description and returns the predicted price.

> ⚠️ **Before deploying**, make sure you’ve added a **Hugging Face token as a secret** to your Modal account:
>
> - **Secret name**: `hf-secret`  
> - **Key**: `HF_TOKEN`  
> - **Value**: *(your Hugging Face access token)*
>
> You can create and manage secrets at [https://modal.com/secrets](https://modal.com/secrets).


In [None]:
%%writefile pricer_electronics_modal_app.py

# 📦 Import required Modal components
import modal
from modal import App, Volume, Image

# 🛠️ Define the Modal application and environment
app = modal.App("pricer-electronics-service")  # Name of your Modal app

# 🔧 Define a base image and install required packages into it
image = Image.debian_slim().pip_install(
    "huggingface", "torch", "transformers", "bitsandbytes", "accelerate", "peft"
)

# 🔐 Load secrets from Modal dashboard (specifically the Hugging Face token)
secrets = [modal.Secret.from_name("hf-secret")]

# 🔢 Define constants for model loading
GPU = "T4"  # Use NVIDIA T4 GPU for inference
BASE_MODEL = "meta-llama/Meta-Llama-3.1-8B"
PROJECT_NAME = "pricer-electronics"
HF_USER = "vassilis19"
RUN_NAME = "2025-04-13_07.20.29"
PROJECT_RUN_NAME = f"{PROJECT_NAME}-{RUN_NAME}"
REVISION = "565999daf03888afae81cadf4c8ce8e0bde9d210"  # Commit hash for the fine-tuned model
FINETUNED_MODEL = f"{HF_USER}/{PROJECT_RUN_NAME}"

# 💾 Local paths where models will be cached
MODEL_DIR = "hf-cache/"
BASE_DIR = MODEL_DIR + BASE_MODEL
FINETUNED_DIR = MODEL_DIR + FINETUNED_MODEL

# 📋 Prompt formatting
QUESTION = "How much does this cost to the nearest dollar?"
PREFIX = "Price is $"

# 🚀 Define the Modal class that will run the model
@app.cls(image=image, secrets=secrets, gpu=GPU, timeout=1800)
class Pricer:

    # 🛠️ Build-time function: Download both base and fine-tuned models to local folders
    @modal.build()
    def download_model_to_folder(self):
        from huggingface_hub import snapshot_download
        import os
        os.makedirs(MODEL_DIR, exist_ok=True)
        snapshot_download(BASE_MODEL, local_dir=BASE_DIR)
        snapshot_download(FINETUNED_MODEL, revision=REVISION, local_dir=FINETUNED_DIR)

    # ✅ Runs once on container startup to load models into memory
    @modal.enter()
    def setup(self):
        import os
        import torch
        from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, set_seed
        from peft import PeftModel

        # ⚙️ Setup 4-bit quantization config
        quant_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_compute_dtype=torch.bfloat16,
            bnb_4bit_quant_type="nf4"
        )

        # 🔓 Load tokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(BASE_DIR)
        self.tokenizer.pad_token = self.tokenizer.eos_token
        self.tokenizer.padding_side = "right"

        # 🧠 Load base model with quantization
        self.base_model = AutoModelForCausalLM.from_pretrained(
            BASE_DIR,
            quantization_config=quant_config,
            device_map="auto"
        )

        # 🔁 Load the fine-tuned model using PEFT
        self.fine_tuned_model = PeftModel.from_pretrained(self.base_model, FINETUNED_DIR, revision=REVISION)

    # 📈 Method exposed as an API endpoint to predict product price
    @modal.method()
    def price(self, description: str) -> float:
        import os
        import re
        import torch
        from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, set_seed
        from peft import PeftModel

        set_seed(42)
        prompt = f"{QUESTION}\n\n{description}\n\n{PREFIX}"  # Construct prompt
        inputs = self.tokenizer.encode(prompt, return_tensors="pt").to("cuda")
        attention_mask = torch.ones(inputs.shape, device="cuda")

        # 🧠 Run inference
        outputs = self.fine_tuned_model.generate(
            inputs,
            attention_mask=attention_mask,
            max_new_tokens=5,
            num_return_sequences=1
        )

        result = self.tokenizer.decode(outputs[0])

        # 💲 Extract numeric price from the model output
        contents = result.split("Price is $")[1]
        contents = contents.replace(',', '')
        match = re.search(r"[-+]?\d*\.\d+|\d+", contents)
        return float(match.group()) if match else 0

    # ⚙️ Ping endpoint to keep the container warm
    @modal.method()
    def wake_up(self) -> str:
        return "ok"


#### 🚀 Deploy the Modal app to the cloud (this will build the image and upload the function)


In [None]:
!modal deploy -m pricer_electronics_modal_app

####  🚀 Load Deployed Modal Class


In [None]:
# Get the deployed class
pricer_cls = modal.Cls.from_name("pricer-electronics-service", "Pricer")
# Instantiate it
pricer = pricer_cls()

####  🚀 Call the deployed Modal function with a sample product description


In [None]:

# Call the `price` method remotely
result = pricer.price.remote("Xiaomi Redmi Note 9S Dual SIM (4GB/64GB) Glacier White.")

print(f"Predicted Price: ${result:.2f}")