# 🔄 Fine-Tune and Serve LLMs with Union.ai: A Hands-On Tutorial

<a target="_blank" href="https://colab.research.google.com/github/unionai-oss/bert-llm-classification-pipeline/blob/main/tutorial.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Welcome to this step-by-step tutorial on building a **Large Language Model (LLM) fine-tuning pipeline** using **Hugging Face Transformers** and **Union.ai’s AI workflow and inference platform**. In this tutorial, you’ll train a **BERT-based model for text classification**, serve it for inference, and track every step of your pipeline using **Union’s powerful MLOps capabilities**.  

This example might seem simple, but the **core concepts and tools** covered here apply to real-world **AI and machine learning (ML) projects** at any scale. By following along, you'll gain hands-on experience in:  

🔸 **Automating ML workflows** with Union.ai  
🔸 **Fine-tuning a transformer model** with Hugging Face  
🔸 **Deploying a model for inference** and tracking artifacts  
🔸 **Optimizing your pipeline** with caching and versioning  

## ✨ Why Use Union.ai?  

By just adding a few lines of code to your Python functions, you'll be able to create a reproducible ML pipeline, taking advantage of Union's features:

With just **a few lines of code**, you can transform your Python functions into **scalable, reproducible AI workflows**. Here’s what you get:  

- **🛠 Reproducible AI Workflows** – Ensure your pipeline runs in the same environment every time.  
- **📌 Versioning & Tracking** – Automatically track **code, models, and artifacts**.  
- **⚡ Faster Iterations with Data Caching** – Reuse previous results to speed up experiments.  
- **🖥 Declarative Infrastructure** – Define ML infrastructure **in code** without manual setup.  
- **📂 Artifact Management** – Keep track of model checkpoints and datasets seamlessly.  
- **📦 Containerized Execution** – Deploy models in a consistent environment with automatic **image building**.  
- **🧑‍💻 Local & Cloud Development** – Test locally before scaling up.  
- **🎭 Actors for Long Running Stateful Containers** – Run **Effceint batch inference** with persistent containers.  
- **…and much more!** 

## 📝 What You'll Build  

By the end of this tutorial, you'll have a **fully functional AI pipeline** that:  

1. **Downloads and processes a dataset** 📥  
2. **Fine-tunes a BERT model for classification** 🏋️‍♂️  
3. **Saves and versions the trained model** 💾  
4. **Deploys the model for real-time inference** 🚀  
5. **Tracks all artifacts and experiments** using Union.ai 📊  

Let’s dive in! Here's a sneak peek at how simple it is to define a **Union-powered ML pipeline**: 

```python
@task(
    cache=True,
    cache_version="4",
    container_image=image,
    requests=Resources(cpu="2", mem="2Gi")
)
def download_data(): -> pd.DataFrame:
    ...

@task(
    container_image=image,
    requests=Resources(cpu="2", mem="20Gi", gpu="1")
)
def train_model(data: pd.DataFrame:): -> pytorch.Model:
    ...

@workflow()
def pipeline_workflow():
    data = download_data()
    train_model(data=data)
    ...

```


## 🧰 Setup 


To get started, sign up for a **Union Serverless** account at [Union.ai](https://union.ai) by clicking the **"Get Started"** button. No credit card is required, and you'll receive **$30 in free credits** to begin experimenting. The signup process takes just a few minutes.  

Alternatively, if you have access to a **[Union BYOC Enterprise](https://www.union.ai/pricing)** account, you can log into your account.  

### 📦 Install Python Packages & Clone Repo

Packages can be installed in your local environment using the following command using your preferred package manager from the [requirements.txt](requirements.txt) file. For example `pip install -r requirements.txt`. 

to clone the repo, run the following command in your environment: `git clone https://github.com/unionai-oss/bert-llm-classification-pipeline`

If you're running this notebook in a Google Colab environment, you can install the packages and clone the GitHub repo directly in the notebook by running the following cell:


In [None]:
try:
    import google.colab
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

if IN_COLAB:
    !git clone https://github.com/unionai-oss/bert-llm-classification-pipeline
    %cd bert-llm-classification-pipeline
    !pip install -r requirements.txt

### 🔐 Authenticate
To use **Union.ai**, you'll need to authenticate your account. Follow the appropriate step based on your setup:  

##### 🔸 **Using Union BYOC Enterprise**  

If you're using a **[Union BYOC Enterprise](https://www.union.ai/pricing)** account, log in with the following command:  
```bash
union create login --host <union-host-url>
```

Replace <union-host-url> with your organization's Union instance URL.

##### 🔸 Using Union Serverless
If you're using [Union Serverless](https://www.union.ai/) , authenticate by running the command below:

Create an account for free at [Union.ai](https://union.ai) if you don't have one yet:
 

In [None]:
!union create login --serverless --auth device-flow

## 🔀 BERT Fine-Tuning Pipeline  

In this section, we'll execute **tasks and workflows** defined in Python under the relevant folders.  

📂 Navigate to the `tasks` and `workflows` folders to explore the code. If you're following along in a **hosted Jupyter Notebook**, you can view the files by clicking the **folder icon** (usually on the left side of the screen).  

### 🛠 Workflow Overview  

We’ll create an **end-to-end machine learning pipeline** to train a **BERT model for text classification** using the **Iris dataset**. The workflow consists of the following steps:  

1. **Download & Preprocess Dataset** 📥  
2. **Download Pretrained BERT Model** 🤖  
3. **Fine-tune BERT Model** 🏋️‍♂️  
4. **Evaluate Model Performance** 📊  
5. **Save Model as an Artifact** 💾 *(We’ll serve the model in the next section)*  
6. **Run a Prediction on New Test Data** 🔍  

> **💡 Note:**  
> In more complex ML workflows, **data pipelines** are often separate from **model training pipelines**.  
> For simplicity, we'll combine them into a single workflow in this example.  

### 🔎 Explore the Code  

To view the workflow, navigate to the [`workflows/train_pipeline.py`](workflows/train_pipeline.py) file.  

- Look for the **`train_pipeline()`** function—this defines the full workflow.  
- The workflow **calls tasks** from the [`tasks`](tasks/) folder.  
- It also **builds a container image** using [`containers.py`](containers.py).  

Once you understand the structure, **run the workflow** and track your pipeline execution with **Union.ai**! 🚀  


In [None]:
!union run --remote workflows/train_pipeline.py train_pipeline

# 🚀 Serving the Fine-Tuned BERT model:

### Live App Serving (Beta)

Union.ai provides a **simple way to serve your models as a live app**, making it easy to interact with your trained model.  

In this example, we'll deploy the model using **Streamlit**, which provides a **simple web interface** for running predictions.  


📂 Check out the following files for the model-serving code:  
- [`app.py`](app.py) – Defines the **Streamlit-based UI** for interacting with the model.  
- [`main.py`](main.py) – Handles **loading the model** and serving it via Union.ai.  

Deploy the model by running the following command:

In [None]:
!union deploy apps app.py bert-sentiment-analysis

Check the Union platform `Apps` tab to see the status of all apps!

Once the app is live, experiment with different inputs and see how your fine-tuned BERT model performs! 🚀

### Batch Serving

Union.ai also provides a way to serve your models in batch mode. This is useful when you have a large number of predictions to make and you want to do them all at once.

In [None]:
!union register workflows/batch_inference.py

In [None]:
from union.remote import UnionRemote
# Create a remote connection
remote = UnionRemote()

In [25]:
def predict_with_container(data):

    inputs = {"texts": data}

    workflow = remote.fetch_workflow(name="workflows.batch_inference.batch_inference_workflow")
    execution = remote.execute(workflow, inputs=inputs, wait=True) # wait=True will block until the execution is complete

    # print(execution.outputs)

    return execution.outputs["o0"]

In [None]:
print(predict_with_container(["I love this movie",
                               "I hate this movie"]
                               ))

### ⚡ Faster batch serving with Union Actors

Union [Actors](https://docs.union.ai/serverless/user-guide/core-concepts/actors/#actors) dramatically reduce the cost of cold starts by maintaining long-running stateful environments that stay ready for use until a defined time-to-live (TTL). This persistent setup eliminates redundant initialization and unlocks several key benefits. This can be especially useful for AI pipelines that benefit from long-running environments, such as large containers, serving models,

In [21]:
def predict_with_actor(data):

    inputs = {"texts": data}

    workflow = remote.fetch_workflow(name="workflows.batch_inference.actor_batch_inference_workflow")
    execution = remote.execute(workflow, inputs=inputs, wait=True) # wait=True will block until the execution is complete

    # print(execution.outputs)

    return execution.outputs['o0']

In [None]:
print(predict_with_actor(["I love this movie",
                               "I hate this movie"]
                               ))

In [None]:
print(predict_with_actor(["I love this movie",
                               "I hate this movie"]
                               ))

In [None]:
print(predict_with_actor(["I love this movie",
                               "I hate this movie"]
                               ))