# LLM Deployment with TensorFlow Serving and Flask API

## 1. Introduction to Model Serving

Model serving refers to making a trained machine learning model available for inference via an interface (usually HTTP or gRPC).

### 🔹 Why Serve Models?
- Enable real-time predictions
- Allow external applications to use the model
- Deploy on scalable infrastructure
- Automate inference as part of a system pipeline

### 🔹 Types of Serving
- **Batch serving**: Process data in bulk (e.g., nightly jobs)
- **Online (real-time) serving**: Respond to incoming prediction requests instantly

---

## 2. TensorFlow Serving Overview

TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments.

### 🔹 Key Features
- Designed for TensorFlow models (also supports others via custom servables)
- Supports model versioning
- Exposes REST and gRPC APIs

### 🔹 Core Concepts
- **SavedModel**: TensorFlow’s standard format for serializing models
- **Model versioning**: You can serve multiple versions of a model
- **Endpoints**:
  - REST: `http://host:port/v1/models/model_name:predict`
  - gRPC: `localhost:8500`

---

## 3. Serving with Flask APIs

Flask is a lightweight Python web framework ideal for custom APIs.

### 🔹 Why Use Flask Instead of TF Serving?
- Custom preprocessing/postprocessing
- Easier debugging and flexibility
- Full control over request/response format

### 🔹 Basic Flask Serving Structure
```python
from flask import Flask, request, jsonify
app = Flask(__name__)

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    # Preprocess input
    # Run model prediction
    # Postprocess output
    return jsonify({'prediction': 'result'})

if __name__ == '__main__':
    app.run(debug=True)
```

---

## 4. Introduction to Docker for Model Packaging

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers.

### 🔹 Why Use Docker?
- Portability: Run the same container anywhere
- Isolation: Avoid dependency conflicts
- Scalability: Easier deployment in cloud or clusters

### 🔹 Core Docker Concepts
- **Image**: A snapshot of your application and environment
- **Container**: A running instance of an image
- **Dockerfile**: Script to build a Docker image

### 🔹 Docker CLI Basics
- `docker build -t my-image .`
- `docker run -p 8501:8501 my-image`
- `docker exec -it <container_id> bash`
- `docker logs <container_id>`

---


# Lab: Create Docker Containers for LLM Deployment

---

## **Objective**

* Package a TensorFlow SavedModel (e.g., `sentiment_model`) into Docker containers.
* Understand using the official TensorFlow Serving image with bind mounts.
* Build a Flask API container that loads and serves the model.

---

## **Task 1: Create Save and Test a Model**

#### Import `distilbert-base-uncased-finetuned-sst-2-english` and save as a TF model

In [1]:
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
import tensorflow as tf

model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = TFAutoModelForSequenceClassification.from_pretrained(model_name)

# Save model in SavedModel format
model.save("sentiment_model")

TensorFlow and JAX classes are deprecated and will be removed in Transformers v5. We recommend migrating to PyTorch classes or pinning your version of Transformers.
All PyTorch model weights were used when initializing TFDistilBertForQuestionAnswering.

All the weights of TFDistilBertForQuestionAnswering were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForQuestionAnswering for predictions without further training.


INFO:tensorflow:Assets written to: qa_model\assets


INFO:tensorflow:Assets written to: qa_model\assets



---

## **Task 2: Container for Flask API Serving the Model**

### Project Structure

```
llm-flask/
├── app.py                # Flask application code
├── sentiment_model/         # TensorFlow SavedModel directory (can be bind-mounted)
├── requirements.txt      # Python dependencies
└── Dockerfile            # Container build instructions
```

### Sample `app.py`

```python
# app.py
from flask import Flask, request, jsonify
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
import tensorflow as tf
import os

app = Flask(__name__)

# Define the path where the model is saved inside the Docker container
MODEL_PATH = "./sentiment_model"

# Load the tokenizer and model globally to avoid reloading on each request
# This assumes the model and tokenizer were saved together or the tokenizer
# can be loaded from the same pre-trained name.
# Since the user saved the model directory, we'll try to load tokenizer from there too.
try:
    tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
    model = TFAutoModelForSequenceClassification.from_pretrained(MODEL_PATH)
    print("Model and tokenizer loaded successfully!")
except Exception as e:
    print(f"Error loading model or tokenizer: {e}")
    # Fallback to original model name if loading from path fails,
    # though this might not work if the saved model is truly custom.
    # For this specific case (distilbert-base-uncased-finetuned-sst-2-english),
    # it's better to load from the saved path.
    tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
    model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
    print("Loaded model and tokenizer from original Hugging Face Hub as a fallback.")


@app.route('/')
def home():
    """
    Home route to confirm the API is running.
    """
    return "Sentiment Analysis API is running! Use /predict endpoint."

@app.route('/predict', methods=['POST'])
def predict_sentiment():
    """
    Predicts the sentiment of the provided text.
    Expects a JSON payload with a 'text' field.
    Example: {"text": "This movie was fantastic!"}
    """
    if not request.is_json:
        return jsonify({"error": "Request must be JSON"}), 400

    data = request.get_json()
    text = data.get('text')

    if not text:
        return jsonify({"error": "No 'text' field found in the request"}), 400

    try:
        # Tokenize the input text
        inputs = tokenizer(text, return_tensors="tf", truncation=True, padding=True)

        # Perform inference
        outputs = model(inputs)
        logits = outputs.logits

        # Get probabilities (softmax for classification)
        probabilities = tf.nn.softmax(logits, axis=-1)
        predicted_class_id = tf.argmax(probabilities, axis=1).numpy()[0]

        # The SST-2 dataset has two labels: 0 for negative, 1 for positive
        sentiment_labels = {0: "Negative", 1: "Positive"}
        predicted_sentiment = sentiment_labels.get(predicted_class_id, "Unknown")
        confidence = probabilities[0][predicted_class_id].numpy().item() # Convert to standard Python float

        return jsonify({
            "text": text,
            "sentiment": predicted_sentiment,
            "confidence": f"{confidence:.4f}"
        })

    except Exception as e:
        print(f"Prediction error: {e}")
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    # Use 0.0.0.0 to make the Flask app accessible from outside the container
    app.run(host='0.0.0.0', port=5000)

```

---

### `requirements.txt`

```
Flask==2.3.2
transformers==4.30.2
tensorflow==2.13.0

```

---

### Dockerfile for Flask API

```dockerfile
# Dockerfile

# Use a lightweight Python base image
FROM python:3.9-slim-buster

# Set the working directory in the container
WORKDIR /app

# Copy the requirements file and install dependencies
# We install tensorflow-cpu to reduce image size unless GPU is specifically needed
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt \
    && pip install tensorflow-cpu==2.13.0 # Explicitly install CPU version of TensorFlow

# Copy the Flask application and the saved model directory
COPY app.py .
COPY sentiment_model ./sentiment_model

# Expose the port that the Flask app will run on
EXPOSE 5000

# Define the command to run the Flask application
# Using gunicorn for a production-ready WSGI server
# Install gunicorn first
RUN pip install gunicorn

# Command to run the Flask application with gunicorn
# -b 0.0.0.0:5000 binds to all network interfaces on port 5000
# app:app refers to the 'app' variable in 'app.py'
CMD ["gunicorn", "-b", "0.0.0.0:5000", "app:app"]


```

---

- Organize your files:

```
sentiment_app/
├── app.py
├── Dockerfile
├── requirements.txt
└── sentiment_model/
    ├── assets/
    ├── keras_metadata.pb
    ├── saved_model.pb
    └── variables/
        └── variables.data-00000-of-00001
        └── variables.index
```


### Build and Run Flask Container

1. Build:

```bash
docker build -t sentiment-api .
```

2. Run (bind mount model directory if you prefer not to bake it into the image):

```bash
docker run -p 5000:5000 sentiment-api
```

3. Test your deployment:

```bash
curl -X POST -H "Content-Type: application/json" -d "{\"text\": \"I absolutely loved this movie, it was fantastic!\"}" http://localhost:5000/predict
```

4. Expected Output:
```json
{
  "confidence": "0.9998",
  "sentiment": "Positive",
  "text": "I absolutely loved this movie, it was fantastic!"
}
```
---
* For the TensorFlow Serving container, the model directory must contain a versioned subfolder (e.g., `/models/pruned_model/1/`) with the SavedModel files.
* For Flask serving, the model can be loaded from a local folder; ensure your `app.py` path matches where the model is mounted or copied.
* Test your REST API endpoints with curl or Postman:

  * TensorFlow Serving example prediction URL: `http://localhost:8501/v1/models/pruned_model:predict`
  * Flask API example prediction URL: `http://localhost:5000/predict`

---



---

## **Task 3: Container for TensorFlow Serving with SavedModel**


### Prepare Your Model for TensorFlow Serving

- TensorFlow Serving expects models to be organized in a specific directory structure: model_name/version_number/.
- Create a new directory for your TensorFlow Serving setup, let's call it tfserving-sentiment.
- Inside tfserving-sentiment, create another directory named sentiment_model (this is the model_name).
- Inside sentiment_model, create a version directory, typically 1 (or any integer representing the version).
- Move your saved sentiment_model content into this version directory.

```
tfserving-sentiment/   <-- NEW DIRECTORY
└── sentiment_model/   <-- Model name for TF Serving
    └── 1/             <-- Model version
        └── assets/
        └── keras_metadata.pb
        └── saved_model.pb
        └── variables/
            └── variables.data-00000-of-00001
            └── variables.index
```


### Create the Dockerfile.tfserving
We'll use an official TensorFlow Serving Docker image:


```dockerfile
# Dockerfile.tfserving (Simplified for Bind Mount)

# Use the official TensorFlow Serving base image
# This image comes with TensorFlow Serving installed and configured.
FROM tensorflow/serving

# Expose the default gRPC (8500) and REST (8501) ports
EXPOSE 8500
EXPOSE 8501

# Command to run TensorFlow Serving.
# Note: --model_base_path will point to the location *inside* the container
# where we will bind mount our model directory.
CMD ["/usr/bin/tensorflow_model_server", \
     "--rest_api_port=8501", \
     "--model_name=sentiment_model", \
     "--model_base_path=/models/sentiment_model"]

```

- **Place this `Dockerfile.tfserving` file inside your tfserving-sentiment directory.**



### Create a Python Script for Prediction (REST API)

- This script will show you how to send a request to the TensorFlow Serving REST API:

```python
# predict_tfserving.py
import requests
import json
import numpy as np

# TensorFlow Serving REST API endpoint
# Replace localhost with your server IP if running remotely
TF_SERVING_REST_URL = "http://localhost:8501/v1/models/sentiment_model:predict"

def predict_sentiment_tfserving(text_input):
    """
    Sends a text input to the TensorFlow Serving REST API for sentiment prediction.
    """

    try:
        from transformers import AutoTokenizer
        import tensorflow as tf # Required for tf.constant conversion

        tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
        inputs = tokenizer(text_input, return_tensors="tf", truncation=True, padding="max_length", max_length=128)

        # Convert TensorFlow tensors to Python lists for JSON serialization
        input_ids_list = inputs["input_ids"].numpy().tolist()
        attention_mask_list = inputs["attention_mask"].numpy().tolist()

        # Construct the payload for TensorFlow Serving
        # Each element in 'instances' is a single prediction request.
        # It should be a dictionary where keys match the model's input names.
        payload = {
            "instances": [
                {
                    "input_ids": input_ids_list[0],  # Get the first (and only) batch item
                    "attention_mask": attention_mask_list[0]
                }
            ]
        }

    except ImportError:
        print("Error: 'transformers' and 'tensorflow' libraries are required in the client environment for tokenization.")
        print("Please install them: pip install transformers tensorflow")
        return None, None
    except Exception as e:
        print(f"Error during tokenization: {e}")
        return None, None

    headers = {"Content-Type": "application/json"}

    try:
        response = requests.post(TF_SERVING_REST_URL, headers=headers, data=json.dumps(payload))
        response.raise_for_status() # Raise an exception for HTTP errors (4xx or 5xx)
        prediction_result = response.json()

        # The output from TF Serving will be the raw logits.
        # We need to apply softmax and get the predicted class.
        # The output structure is typically {"predictions": [[logit_0, logit_1]]}
        logits = np.array(prediction_result["predictions"][0])
        probabilities = np.exp(logits) / np.sum(np.exp(logits)) # Manual softmax
        predicted_class_id = np.argmax(probabilities)

        sentiment_labels = {0: "Negative", 1: "Positive"}
        predicted_sentiment = sentiment_labels.get(predicted_class_id, "Unknown")
        confidence = probabilities[predicted_class_id]

        return predicted_sentiment, confidence

    except requests.exceptions.ConnectionError:
        print(f"Error: Could not connect to TensorFlow Serving at {TF_SERVING_REST_URL}.")
        print("Please ensure the Docker container is running and accessible.")
        return None, None
    except requests.exceptions.HTTPError as http_err:
        print(f"HTTP error occurred: {http_err} - {response.text}")
        return None, None
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        return None, None

if __name__ == "__main__":
    test_text_positive = "This movie was absolutely brilliant and I loved every minute of it!"
    test_text_negative = "I hated this product, it was a complete waste of money."

    print(f"Testing positive sentiment: '{test_text_positive}'")
    sentiment, confidence = predict_sentiment_tfserving(test_text_positive)
    if sentiment:
        print(f"Sentiment: {sentiment}, Confidence: {confidence:.4f}\n")

    print(f"Testing negative sentiment: '{test_text_negative}'")
    sentiment, confidence = predict_sentiment_tfserving(test_text_negative)
    if sentiment:
        print(f"Sentiment: {sentiment}, Confidence: {confidence:.4f}\n")

```


- **Place this `predict_tfserving.py` file inside your tfserving-sentiment directory.**

### Step-by-Step Deployment and Testing

- **Navigate to project Dir:**

```bash
cd tfserving-sentiment
```

- **Build the TensorFlow Serving Docker Image:**

```bash
docker build -f Dockerfile.tfserving -t sentiment-tfserving .
```
`-f Dockerfile.tfserving`: Specifies that we are using Dockerfile.tfserving instead of the default Dockerfile.

`-t sentiment-tfserving`: Tags the image with the name sentiment-tfserving.

`.`: Specifies the build context (the current directory), which is where your sentiment_model directory (containing the 1 version folder) should be.



- **Run TF Serving container:**

```bash
docker run -p 8501:8501 -p 8500:8500 --name tf-sentiment-server sentiment-tfserving
```

`-p 8501:8501`: Maps the container's REST API port to your host's port 8501.

`-p 8500:8500`: Maps the container's gRPC port to your host's port 8500 (optional for this example, but good practice).

`--name tf-sentiment-server`: Gives your running container a friendly name.

`sentiment-tfserving`: The name of the Docker image you just built.

> You should see TensorFlow Serving startup logs in your terminal, indicating that it's loading the sentiment_model at version 1.

### Run prediction script

```bash
python predict_tfserving.py
```


# Lab: Deploying LLMs with TensorFlow Serving and with Flask API on AWS ECS
-----

### Pre-deployment Checklist 

Before you touch the AWS Console, you need to prepare your Docker images and your model locally:

1.  **Build Docker Images Locally:**

      * Ensure you have your `sentiment-api` Docker image (for Flask).
      * Ensure you have your `sentiment-tfserving` Docker image (for TensorFlow Serving).
          * Remember, for `sentiment-tfserving`, your `Dockerfile.tfserving` should be the **simplified version** (without `COPY sentiment_model`) because we'll be loading the model from S3.

2.  **Prepare Your Model for S3:**

      * Make sure your saved sentiment model is organized locally in the `sentiment_model/1/` structure (e.g., `tfserving-sentiment/sentiment_model/1/`).

-----

### Step-by-Step AWS Management Console Deployment

-----

### Part 1: Common AWS Setup (ECR, IAM, Networking)

These steps are done once for both deployments.

#### 1\. Push Docker Images to Amazon ECR

1.  **Go to ECR:**

      * Open your web browser and navigate to the [AWS Management Console](https://aws.amazon.com/console/).
      * Search for "ECR" in the search bar and click on "Elastic Container Registry".

2.  **Create Repositories:**

      * Click the **"Create repository"** button.
      * **For Flask App:**
          * **Visibility settings:** Keep "Private".
          * **Repository name:** Enter `sentiment-flask-app`.
          * Leave other options as default.
          * Click **"Create repository"**.
      * **For TF Serving:**
          * Click **"Create repository"** again.
          * **Repository name:** Enter `sentiment-tfserving-model`.
          * Click **"Create repository"**.

3.  **Get Push Commands (and run them locally):**

      * On the ECR Repositories page, click on `sentiment-flask-app`.
      * Click the **"View push commands"** button.
      * **Copy the commands one by one and run them in your local terminal.** This will authenticate Docker and push your `sentiment-api:latest` image to ECR.
      * Repeat this process for the `sentiment-tfserving-model` repository, pushing your `sentiment-tfserving:latest` image.
      * **Wait for both pushes to complete.**

#### 2\. Create IAM Role for ECS Task Execution

This role allows Fargate tasks to pull images and send logs.

1.  **Go to IAM:**

      * Search for "IAM" in the console search bar and click on "IAM".
      * In the left navigation pane, click **"Roles"**.

2.  **Create Role:**

      * Click the **"Create role"** button.
      * **Trusted entity type:** Select "AWS service".
      * **Use case:** Select "Elastic Container Service" from the list, then choose "Elastic Container Service Task".
      * Click **"Next"**.
      * **Permissions policies:** Search for `AmazonECSTaskExecutionRolePolicy`. Check the box next to it.
      * Click **"Next"**.
      * **Role details:**
          * **Role name:** Enter `ecsTaskExecutionRole` (if you don't have one already).
          * Review and click **"Create role"**.

#### 3\. Identify VPC and Public Subnets

We'll use your default VPC for simplicity.

1.  **Go to VPC:**
      * Search for "VPC" in the console search bar and click on "VPC".
      * In the left navigation pane, click **"Your VPCs"**. Note down the **VPC ID** of your "Default" VPC.
      * In the left navigation pane, click **"Subnets"**.
      * Filter the subnets by your Default VPC ID.
      * Identify at least **two subnets** that have "Yes" under "Auto-assign public IPv4 address" or whose route table has a route to an Internet Gateway. Note down their **Subnet IDs**. These are your public subnets.

#### 4\. Create Security Groups

These control network traffic to your ALB and Fargate tasks.

1.  **Go to EC2:**

      * Search for "EC2" in the console search bar and click on "EC2".
      * In the left navigation pane, scroll down to "Network & Security" and click **"Security Groups"**.

2.  **Create ALB Security Group:**

      * Click **"Create security group"**.
      * **Basic details:**
          * **Security group name:** `sentiment-alb-sg-demo`
          * **Description:** `Security group for sentiment demo ALB`
          * **VPC:** Select your Default VPC ID.
      * **Inbound rules:**
          * Click **"Add rule"**.
          * **Type:** "HTTP", **Source:** "Anywhere-IPv4" (`0.0.0.0/0`).
          * Click **"Add rule"** again.
          * **Type:** "HTTPS", **Source:** "Anywhere-IPv4" (`0.0.0.0/0`) (Good practice, even if not using HTTPS yet).
      * Leave outbound rules as default.
      * Click **"Create security group"**. Note its **Security Group ID**.

3.  **Create Task Security Group:**

      * Click **"Create security group"** again.
      * **Basic details:**
          * **Security group name:** `sentiment-task-sg-demo`
          * **Description:** `Security group for sentiment demo Fargate tasks`
          * **VPC:** Select your Default VPC ID.
      * **Inbound rules:**
          * Click **"Add rule"**.
          * **Type:** "Custom TCP".
          * **Port range:** `5000` (for Flask)
          * **Source:** Select "Custom" and type the **Security Group ID of `sentiment-alb-sg-demo`** (it should auto-complete as you type). This allows traffic only from the ALB.
          * Click **"Add rule"** again.
          * **Type:** "Custom TCP".
          * **Port range:** `8501` (for TF Serving)
          * **Source:** Select "Custom" and type the **Security Group ID of `sentiment-alb-sg-demo`**.
      * Leave outbound rules as default.
      * Click **"Create security group"**. Note its **Security Group ID**.

#### 5\. Create ECS Cluster

1.  **Go to ECS:**

      * Search for "ECS" in the console search bar and click on "Elastic Container Service".
      * In the left navigation pane, click **"Clusters"**.

2.  **Create Cluster:**

      * Click **"Create cluster"**.
      * **Cluster name:** Enter `sentiment-cluster-demo`.
      * **Infrastructure:** Select "AWS Fargate (serverless)".
      * Click **"Create"**.

#### 6\. Create Application Load Balancer (ALB) and Target Groups

We'll create one ALB and two target groups (one for Flask, one for TF Serving).

1.  **Go to EC2:**

      * Search for "EC2" and click on "EC2".
      * In the left navigation pane, scroll down to "Load Balancing" and click **"Load Balancers"**.

2.  **Create ALB:**

      * Click **"Create Load Balancer"**.
      * Select **"Application Load Balancer"** and click "Create".
      * **Basic configuration:**
          * **Load balancer name:** `sentiment-demo-alb`.
          * **Scheme:** "Internet-facing".
          * **IP address type:** "IPv4".
      * **Network mapping:**
          * **VPC:** Select your Default VPC.
          * **Mappings:** Select the two public Subnet IDs you identified earlier.
      * **Security groups:** Select your `sentiment-alb-sg-demo`.
      * **Listeners and routing:**
          * **Protocol:** "HTTP", **Port:** `80`.
          * **Default action:** We will update this later. For now, select "Create new target group".
              * **Target group name:** `sentiment-flask-tg-demo`.
              * **Protocol:** "HTTP", **Port:** `5000`.
              * **VPC:** Select your Default VPC.
              * **Health checks:** Keep default (HTTP, path `/`).
              * Click **"Create target group"**.
          * Go back to the ALB creation tab. For the HTTP:80 listener, under "Default action", select **"Forward to target groups"** and choose `sentiment-flask-tg-demo`.
      * Click **"Create load balancer"**.
      * **Wait for the ALB to provision (it will say "active"). Note down its DNS name.**

-----

### Part 2: Deploy Flask App on Fargate

#### 1\. Create CloudWatch Log Group for Flask

1.  **Go to CloudWatch:**
      * Search for "CloudWatch" and click on "CloudWatch".
      * In the left navigation pane, click **"Log groups"** (under "Logs").
      * Click **"Create log group"**.
      * **Log group name:** `/ecs/sentiment-flask-task-demo`.
      * Click **"Create"**.

#### 2\. Create ECS Task Definition for Flask

1.  **Go to ECS:**

      * Search for "ECS" and click on "Elastic Container Service".
      * In the left navigation pane, click **"Task Definitions"**.

2.  **Create New Task Definition:**

      * Click **"Create new task definition"**.
      * **Launch type compatibility:** Select **"Fargate"**.
      * Click **"Next step"**.
      * **Task Definition name:** `sentiment-flask-task-demo`.
      * **Task role:** (Leave as `None` for this demo).
      * **Task execution role:** Select `ecsTaskExecutionRole`.
      * **Task size:**
          * **Task CPU (vCPU):** `0.5 vCPU (512)`
          * **Task memory (GB):** `1 GB (1024)`
      * **Container Definitions:** Click **"Add container"**.
          * **Container name:** `sentiment-flask-container`.
          * **Image:** Paste the ECR URI for your `sentiment-flask-app` image (e.g., `your-aws-account-id.dkr.ecr.your-aws-region.amazonaws.com/sentiment-flask-app:latest`).
          * **Port mappings:**
              * **Host port:** `5000` (leave blank, Fargate handles this)
              * **Container port:** `5000`
              * **Protocol:** `tcp`
          * **Essential:** Keep checked.
          * **Log configuration:**
              * **Log driver:** Select `awslogs`.
              * **Options:**
                  * `awslogs-group`: `/ecs/sentiment-flask-task-demo`
                  * `awslogs-region`: Your AWS region (e.g., `us-east-1`)
                  * `awslogs-stream-prefix`: `ecs`
      * Click **"Add"**.
      * Review and click **"Create"**.

#### 3\. Create ECS Service for Flask

1.  **Go to ECS:**

      * In the left navigation pane, click **"Clusters"**.
      * Click on your `sentiment-cluster-demo`.
      * Go to the **"Services"** tab.

2.  **Create Service:**

      * Click **"Create"**.
      * **Configure service:**
          * **Launch type:** "Fargate".
          * **Task Definition:** Select `sentiment-flask-task-demo` (and its latest revision).
          * **Service name:** `sentiment-flask-service-demo`.
          * **Desired tasks:** `1`.
      * **Networking:**
          * **VPC:** Select your Default VPC.
          * **Subnets:** Select the two public Subnet IDs you identified.
          * **Security groups:** Select your `sentiment-task-sg-demo`.
          * **Auto-assign public IP:** Select **"ENABLED"**.
      * **Load balancing:**
          * **Load balancer type:** Select "Application Load Balancer".
          * **Load balancer name:** Select `sentiment-demo-alb`.
          * **Container to load balance:** Click "Add container to load balancer" and select `sentiment-flask-container:5000`.
          * **Target group name:** Select `sentiment-flask-tg-demo`.
      * Review and click **"Create Service"**.
      * **Wait for the service to become "ACTIVE"** (this can take a few minutes as tasks provision).

#### 4\. Test Flask App Deployment

1.  **Get ALB DNS Name:**
      * Go to EC2 -\> Load Balancers.
      * Select `sentiment-demo-alb`.
      * Copy its **DNS name**.
2.  **Test with `curl` (from your local terminal):**
    ```bash
    curl -X POST -H "Content-Type: application/json" \
         -d "{\"text\": \"This movie was absolutely brilliant!\"}" \
         http://<ALB-DNS-NAME>/predict
    ```
    (Replace `<ALB-DNS-NAME>` with the actual DNS name). You should get a JSON response.

-----

### Part 3: Deploy TensorFlow Serving on Fargate

#### 1\. Upload Model to S3

1.  **Go to S3:**

      * Search for "S3" and click on "S3".

2.  **Create Bucket:**

      * Click **"Create bucket"**.
      * **Bucket name:** Enter a **globally unique** name (e.g., `your-aws-account-id-sentiment-model-demo`).
      * **AWS Region:** Select your AWS region.
      * Leave other options as default.
      * Click **"Create bucket"**.

3.  **Upload Model Files:**

      * Click on your newly created bucket.
      * Click **"Upload"**.
      * Click **"Add folder"**.
      * Select your local `tfserving-sentiment/sentiment_model/` folder (the one containing the `1` version folder).
      * Click **"Upload"**. Confirm the upload. Ensure the structure `sentiment_model/1/` is preserved in S3.

#### 2\. Update IAM Role for S3 Access

Your `ecsTaskExecutionRole` needs permission to read from S3.

1.  **Go to IAM:**

      * Search for "IAM" and click on "IAM".
      * In the left navigation pane, click **"Roles"**.
      * Find and click on your `ecsTaskExecutionRole`.

2.  **Attach S3 Policy:**

      * On the "Permissions" tab, click **"Add permissions"** -\> **"Attach policies"**.
      * Search for `AmazonS3ReadOnlyAccess`.
      * Check the box next to it.
      * Click **"Add permissions"**.

#### 3\. Create CloudWatch Log Group for TF Serving

1.  **Go to CloudWatch:**
      * Search for "CloudWatch" and click on "CloudWatch".
      * In the left navigation pane, click **"Log groups"** (under "Logs").
      * Click **"Create log group"**.
      * **Log group name:** `/ecs/sentiment-tfserving-task-demo`.
      * Click **"Create"**.

#### 4\. Add New Listener Rule to Existing ALB

This routes requests with a `/tfserve` path to your TF Serving container.

1.  **Go to EC2:**

      * Search for "EC2" and click on "EC2".
      * In the left navigation pane, scroll down to "Load Balancing" and click **"Load Balancers"**.
      * Select your `sentiment-demo-alb`.
      * Go to the **"Listeners"** tab.
      * Select the "HTTP : 80" listener and click **"View/edit rules"**.
      * Click the **"+"** icon (to insert a rule) or **"Insert Rule"**.

2.  **Configure Rule:**

      * Click **"Add condition"** -\> **"Path"**.
      * **Path pattern:** Enter `/tfserve*`.
      * Click **"Add action"** -\> **"Forward to"**.
      * Select **"Create new target group"**.
          * **Target group name:** `sentiment-tfserving-tg-demo`.
          * **Protocol:** "HTTP", **Port:** `8501`.
          * **VPC:** Select your Default VPC.
          * **Health checks:** Path `/v1/models/sentiment_model`.
          * Click **"Create target group"**.
      * Go back to the rule configuration. For the "Forward to" action, select your newly created `sentiment-tfserving-tg-demo`.
      * Click the **"Save"** button (top right).

#### 5\. Create ECS Task Definition for TF Serving

1.  **Go to ECS:**

      * Search for "ECS" and click on "Elastic Container Service".
      * In the left navigation pane, click **"Task Definitions"**.

2.  **Create New Task Definition:**

      * Click **"Create new task definition"**.
      * **Launch type compatibility:** Select **"Fargate"**.
      * Click **"Next step"**.
      * **Task Definition name:** `sentiment-tfserving-task-demo`.
      * **Task execution role:** Select `ecsTaskExecutionRole` (which now has S3 read access).
      * **Task size:**
          * **Task CPU (vCPU):** `0.5 vCPU (512)`
          * **Task memory (GB):** `1 GB (1024)`
      * **Container Definitions:** Click **"Add container"**.
          * **Container name:** `sentiment-tfserving-container`.
          * **Image:** Paste the ECR URI for your `sentiment-tfserving-model` image.
          * **Port mappings:**
              * **Host port:** `8501` (leave blank)
              * **Container port:** `8501`
              * **Protocol:** `tcp`
          * **Essential:** Keep checked.
          * **Command:** This is critical. Enter the command as a JSON array of strings, separated by commas (no spaces between elements in the array):
            ```
            ["/usr/bin/tensorflow_model_server","--rest_api_port=8501","--model_name=sentiment_model","--model_base_path=s3://your-aws-account-id-sentiment-model-demo/sentiment_model"]
            ```
            (Replace `your-aws-account-id-sentiment-model-demo` with your actual S3 bucket name).
          * **Log configuration:**
              * **Log driver:** Select `awslogs`.
              * **Options:**
                  * `awslogs-group`: `/ecs/sentiment-tfserving-task-demo`
                  * `awslogs-region`: Your AWS region
                  * `awslogs-stream-prefix`: `ecs`
      * Click **"Add"**.
      * Review and click **"Create"**.

#### 6\. Create ECS Service for TF Serving

1.  **Go to ECS:**

      * In the left navigation pane, click **"Clusters"**.
      * Click on your `sentiment-cluster-demo`.
      * Go to the **"Services"** tab.

2.  **Create Service:**

      * Click **"Create"**.
      * **Configure service:**
          * **Launch type:** "Fargate".
          * **Task Definition:** Select `sentiment-tfserving-task-demo` (and its latest revision).
          * **Service name:** `sentiment-tfserving-service-demo`.
          * **Desired tasks:** `1`.
      * **Networking:**
          * **VPC:** Select your Default VPC.
          * **Subnets:** Select the two public Subnet IDs you identified.
          * **Security groups:** Select your `sentiment-task-sg-demo`.
          * **Auto-assign public IP:** Select **"ENABLED"**.
      * **Load balancing:**
          * **Load balancer type:** Select "Application Load Balancer".
          * **Load balancer name:** Select `sentiment-demo-alb`.
          * **Container to load balance:** Click "Add container to load balancer" and select `sentiment-tfserving-container:8501`.
          * **Target group name:** Select `sentiment-tfserving-tg-demo`.
      * Review and click **"Create Service"**.
      * **Wait for the service to become "ACTIVE"**.

#### 7\. Test TF Serving Deployment

1.  **Get ALB DNS Name:**
      * Go to EC2 -\> Load Balancers.
      * Select `sentiment-demo-alb`.
      * Copy its **DNS name**.
2.  **Update `predict_tfserving.py` (locally):**
      * Open your `predict_tfserving.py` script.
      * Change the `TF_SERVING_REST_URL` to:
        ```python
        TF_SERVING_REST_URL = "http://<ALB-DNS-NAME>/tfserve/v1/models/sentiment_model:predict"
        ```
        (Replace `<ALB-DNS-NAME>` with the actual DNS name).
3.  **Run `predict_tfserving.py` (from your local terminal):**
    ```bash
    python predict_tfserving.py
    ```
    You should see the sentiment predictions.

-----

### Cleanup (Important to Avoid Charges\!)

When you're done with the lab:

1.  **Delete ECS Services:**
      * Go to ECS -\> Clusters -\> `sentiment-cluster-demo`.
      * Go to the "Services" tab.
      * Select `sentiment-flask-service-demo` and click "Delete". Confirm.
      * Select `sentiment-tfserving-service-demo` and click "Delete". Confirm.
2.  **Delete ECS Task Definitions:**
      * Go to ECS -\> Task Definitions.
      * Select `sentiment-flask-task-demo` and `sentiment-tfserving-task-demo`.
      * Click "Actions" -\> "Deregister task definition".
3.  **Delete Load Balancer & Target Groups:**
      * Go to EC2 -\> Load Balancers.
      * Select `sentiment-demo-alb` and click "Actions" -\> "Delete load balancer". Confirm.
      * Go to EC2 -\> Target Groups.
      * Select `sentiment-flask-tg-demo` and `sentiment-tfserving-tg-demo`. Click "Actions" -\> "Delete". Confirm.
4.  **Delete ECS Cluster:**
      * Go to ECS -\> Clusters.
      * Select `sentiment-cluster-demo` and click "Delete Cluster". Confirm.
5.  **Delete Security Groups:**
      * Go to EC2 -\> Security Groups.
      * Select `sentiment-alb-sg-demo` and `sentiment-task-sg-demo`. Click "Actions" -\> "Delete security groups". Confirm.
6.  **Empty and Delete S3 Bucket:**
      * Go to S3.
      * Select `your-aws-account-id-sentiment-model-demo`.
      * Click "Empty" (follow instructions to confirm).
      * Then click "Delete" (follow instructions to confirm).
7.  **Delete ECR Repositories:**
      * Go to ECR.
      * Select `sentiment-flask-app` and `sentiment-tfserving-model`.
      * Click "Delete". Confirm.
8.  **Detach IAM Policies (Optional):** If you attached `AmazonS3ReadOnlyAccess` to `ecsTaskExecutionRole`, you might want to detach it if you don't need it for other services.
