# End to End Chatbot MLOPS

## Overview

## Archtecture


### Repository Structure

```
/chatbot-mlops
├──.github/workflows/   # GitHub Actions for CI/CD
├── docs/                # Project documentation (README, model cards, etc.)
├── infra/               # Infrastructure configs (Docker, K8s, Helm)
│   ├── mlflow/
│   ├── monitoring/
│   └── postgres/
├── labeling/            # Label Studio configurations and guidelines
├── ml/                  # All ML-related code
│   ├── data/            # Data artifacts (managed by DVC)
│   ├── models/          # Model definitions and architectures
│   ├── notebooks/       # Exploratory notebooks
│   ├── training/        # Training and evaluation scripts
│   └── utils/           # Helper functions for ML tasks
├── services/            # Application microservices
│   ├── api/             # FastAPI backend service
│   ├── worker/          # Prefect worker and flows
│   └── agent-ui/        # UI for human escalation
├── tests/               # Unit and integration tests
├──.env                 # Environment variables (not committed to git)
├──.gitignore           # Files and directories to ignore
├── dvc.yaml             # DVC pipeline definition
├── docker-compose.yml   # Local infrastructure stack
├── environment.yml      # Conda environment for development
└── requirements.txt     # Pip dependencies for production
```

### System Data Flow
<img src="assets/system_data_flow.png" alt="System Data Flow" width="700"/>



1.	User Chat UI → API Gateway
    - The user types a message (like “Where’s my order?”).
    - That message goes into the API Gateway (the entry door).

2.	API Gateway → Bot Manager
    - The API Gateway hands the message to the Bot Manager.
    - The Bot Manager is the brain of the system — it decides what to do with the message.

3.	Bot Manager (Different Paths), Depending on the type of message, the Bot Manager has multiple options:
    - Detect Intent: Understands what the user wants (e.g., track order, ask about a product, FAQs).
    - Extract Information: Pulls key details like order ID, product name, or location.

    - Search Knowledge (Vector + Keywords):
        - Looks up answers from the Product Catalog or FAQs.
        - Uses two techniques: keyword search (Elasticsearch) and meaning-based search (FAISS).

    - Generate Answer (LLM + RAG):
        - If the answer needs to be written out more naturally, it asks a language model.
        - The model uses retrieved context so the answer is factual and not hallucinated.

    - Human Agent (Fallback): If the system isn’t confident, it sends the chat to a real person.

4.	Final Response → User Chat UI
    - Whatever action is chosen (search, database lookup, LLM, or human agent), the Bot Manager puts together a final response.
    - That response goes back through the API → and shows up in the user’s chat window.

In short:
```
User asks → API receives → Bot Manager thinks → chooses (search, database, LLM, or human) → creates final answer → sends back to user.
```

### Training Pipeline
<img src="assets/training_pipeline.png" alt="Training Pipeline" width="700"/>


1.	Collect Data
    - Gather raw data: chat logs, FAQs, product info, and past user interactions.
2.	Clean & Prepare Data
    - Remove noise, fix formatting, and organize the text so it’s ready for training.
3.	Add Labels
    - Tag the data with useful information:
        - e.g., mark what the intent is (“track order”),
        - highlight entities (like order_id or city).
4.	Create Features
    - Transform the raw text into a machine-readable format (numbers, vectors, embeddings).
5.	Train Different Models
    - Intent Model → Learns to recognize what the user wants.
    - Entity Extractor → Learns to pick out important details (order ID, product name).
    - Search Model → Learns to find the right info in the catalog or knowledge base.
    - Answer Generator → Learns to generate natural, human-like responses.
6.	Evaluate Models
    - Test each model to see how well it performs (accuracy, precision, recall).
    - Only good models move forward.
7.	Save Models (MLflow Registry)
    - Store the approved models in a central registry so they’re tracked and versioned.
8.	Build Package (Docker Image)
    - Bundle the model + code into a portable package (Docker).
    - This makes it easy to run anywhere.
9.	Deploy to Cluster (K8s via Helm)
    - Deploy the package to a Kubernetes cluster.
    - Now the model is live and can serve real user queries.

In short:
```
Data → Clean → Label → Features → Train → Evaluate → Save → Package → Deploy → Ready for Users.
```

## Automatic Pipeline Run Script

### [`Docker Compose`](docker-compose.yml) :  Orchestrates services/containers.
This docker compose file will run the following services automatically, with all configuration of every service set in .env for security:
- MinIO
- PostgreSQL
- MLflow
- Prefect
- Grafana
- Prometheus
- Kafka
- Elasticsearch

**Quickstart :**

1. To run docker compose :
    ```bash
    docker-compose up -d --build
    ```

2. To check the status of all running containers:
    ```bash
    docker-compose ps
    ```

3. To shut down the stack:
    ```bash
    docker-compose down
    ```


### [`DVC`](dvc.yaml) :  Orchestrates data & ML pipeline. 
This DVC will run the following stages/script automatically.
-  Step 1: download raw data.
-  Step 2: preprocess/clean data.
-  Step 3: feature engineering.
-  Step 4: train model.
-  Step 5: evaluate and push metrics.

**QuickStart :**
1. To run DVC :
    ```bash
    dvc repro
    ```
2. To check the status of DVC :
    ```bash
    dvc status
    ```

### [PostgreSQL Initialization](`infra/postgres/init.sql`)

Our docker compose file is need one more configuration, which is create database. here, we will create database `mlflow` in `postgres` service that being set to run automatically when the first time docker compose is running.

### [MLflow Custom Dockerfile](`infra/mlflow/Dockerfile`)

Why we need custom dockerfile for mlflow?, its because mlflow need to be set up manually everytime it start, and by containerizing all setup like dependencies, python etc. it just need to run the container and being set up automatically.


### Local Service Endpoints and Access

To make it easier to navigate and access all service server. we make reference that tell us all service endpoint and default credential.

| Service           | Purpose                        | Local URL                    | Default User | Default Password |
| ----------------- | ------------------------------ | ---------------------------- | ------------ | ---------------- |
| **Chatbot API**   | Main application endpoint      | `http://localhost:8000/docs` | N/A          | N/A              |
| **MLflow**        | Experiment Tracking & Registry | `http://localhost:5001`      | N/A          | N/A              |
| **MinIO Console** | S3 Artifact Storage UI         | `http://localhost:9001`      | `minioadmin` | `minioadmin`     |
| **Grafana**       | Monitoring Dashboards          | `http://localhost:3000`      | `admin`      | `admin`          |
| **Prometheus**    | Metrics Server UI              | `http://localhost:9090`      | N/A          | N/A              |
| **Prefect UI**    | Workflow Orchestration         | `http://localhost:4200`      | N/A          | N/A              |
| **PostgreSQL**    | Database Connection            | `localhost:5432`             | `postgres`   | `postgres`       |
| **Elasticsearch** | Search API                     | `http://localhost:9200`      | N/A          | N/A              |
| **Kafka**         | Broker Connection              | `localhost:9092`             | N/A          | N/A              |


## Environment and Dependencies

