# Hosting a Boltz Serving App

[Boltz](https://github.com/jwohlwend/boltz) is an open-source biomolecular structure prediction model with performance on par with Alphafold3. This notebook details a simple boltz prediction task as well as how to serve a prediction endpoint using FastAPI and Union Serving.

## Overview
- Define a [remote object](https://docs.union.ai/byoc/user-guide/development-cycle/union-remote/) to interact with the Union cluster
- Materialize an Artifact representing the [model on Huggingface](https://huggingface.co/boltz-community/boltz-1)
- Create an [ImageSpec](https://docs.union.ai/byoc/user-guide/development-cycle/image-spec#imagespec) definition for use throughout
- Create a simple prediction workflow using [Actors](https://docs.union.ai/byoc/user-guide/core-concepts/actors/#actors)
- Define a FastAPI serving endpoint
- Deploy the app via Union [Serving](https://docs.union.ai/byoc/user-guide/core-concepts/serving/#serving)

## Setup
- Install the `union` package
- Create a config file via `union create login` and make it available at the environment variable below

### UnionRemote

The following cell will refer to your config file and create a UnionRemote object that's used throughout the rest of the notebook. This object allows you to register entities, trigger executions, and retrieve outputs in a programmatic way.

In [None]:
import os
import subprocess
from union import UnionRemote, ImageSpec, ActorEnvironment, FlyteFile, FlyteDirectory, workflow, Resources, Artifact
from flytekit.configuration import Config

os.environ["UNION_CONFIG"] = "~/.union/config_serving.yaml"

remote = UnionRemote(config=Config.auto(config_file=os.path.expanduser(os.environ["UNION_CONFIG"])))

### Cache Model from Huggingface

UnionRemote has a convenience function for caching models from Huggingface as Union Artifacts. You'll need to create an API token on HF and then upload it via `union create secret --name HF_TOKEN`. You'll also have to create an admin key via `union create api-key admin --name UNION_API_KEY`.

This will run a workflow that fetches the model and emits an Artifact. You can view the execution in the console, as well as the model in the Artifacts tab once the workflow has completed.

In [None]:
from union.remote import HuggingFaceModelInfo
info = HuggingFaceModelInfo(repo="boltz-community/boltz-1")

cache_exec = remote._create_model_from_hf(
    info=info, 
    hf_token_key="HF_TOKEN", 
    union_api_key="UNION_API_KEY",
)

cache_exec = cache_exec.wait(poll_interval=2)
cache_exec.outputs

### Create an ImageSpec Definition

ImageSpec is an easy and flexible way of defining the images you'll be using throughout your workflow and in your apps. A number of options are built in for PyPI packages, conda packages, etc. 

We define a number of PyPI packages as well as the `build-essential` APT bundle for Boltz. Finally, we install Boltz via an arbitrary RUN command.

Of note here is the use of the `union` builder. This will ship the ImageSpec definition off to a hosted builder in your Union cluster. This unburdens your local machine from having to build and push an image yourself, resulting in faster iteration cycles. Moreover, the remote builder uses performance enhancements like layer caching and PyPI proxying to speed up builds even more.

In [None]:
image = ImageSpec(
    name="boltz",
    packages=[
        "union",
        "flytekit==1.15",
        "union-runtime==0.1.11",
        "fastapi==0.115.11",
        "pydantic==2.10.6",
        "uvicorn==0.34.0",
        "python-multipart==0.0.20",
    ],
    apt_packages=["build-essential"],
    builder="union",
    commands=["pip install boltz==0.4.1"]
)

### Actor Workflow

Actors are a powerful primitve offering substantial performance improvements as well as unlocking certain capabilities not possible with regular tasks. By using a warm pod capable of accepting multiple task submissions, the overhead of pod scheduling and cleanup are removed. This results in faster iterations between tasks as well as enabling substantial improvements during large parallel executions.

We first define an ActorEnvironment using many of the same parameters we're accustomed to for regular tasks. Additionally, we define a replica count and a time-to-live to control parallelism capacity as well as how long to persist between task submissions. Once defined, the actor environment can be used in exactly the same way as the usual `@task` decorator.

The workflow itself requires no special treatment regarding actor tasks vs regular tasks. Finally, we call the workflow using `remote.execute`, pass in the input, and await a response. Once the workflow is submitted, head over to the console to watch the actor environment get provisioned and process the prediction!

Once the execution succeeds, the actor pod will remain active for the specified 10 minutes. Try changing something in the task itself and run the cell again. Everything will execute and return much faster.

In [None]:
actor = ActorEnvironment(
    name="boltz-actor",
    replica_count=1,
    ttl_seconds=600,
    requests=Resources(
        cpu="2",
        mem="10Gi",
        gpu="1",
    ),
    container_image=image,
)

@actor.task
def simple_predict(input: FlyteFile) -> FlyteDirectory:
    input.download()
    out = "/tmp/boltz_out"
    os.makedirs(out, exist_ok=True)
    subprocess.run(["boltz", "predict", input.path, "--out_dir", out, "--use_msa_server"])
    return FlyteDirectory(path=out)

@workflow
def act_wf(input: FlyteFile) -> FlyteDirectory:
    return simple_predict(input=input)

execution = remote.execute(
    entity=act_wf, 
    inputs={"input": "prot_no_msa.yaml"}, 
    wait=True
)
output = execution.outputs
print(output)

### FastAPI App

Here, we initialize our FastAPI application, which will serve as the foundation for our API endpoints.

First, we implement some convenience functions: `package_outputs` and the asynchronous `generate_response`. The former is fairly self-explanatory, however the latter manages the execution of the Boltz process, yielding empty bytes during processing to maintain the connection. By implementing this as an asynchronous generator, we ensure our web server remains responsive during potentially long-running Boltz computations.

The heart of our implementation is the `/predict/` endpoint, which we define using FastAPI's decorator pattern. This endpoint accepts YAML input sequences and optional configuration parameters, optional MSA (Multiple Sequence Alignment) files, and additional CLI options.

Next, we construct and execute the Boltz command with appropriate parameters, including any custom options provided by the client. We've implemented flexibility here - if an MSA file is provided, we use it directly; otherwise, we instruct Boltz to use the `mmseqs2` MSA server for sequence alignments.
We're careful to implement robust error handling throughout our application, capturing and returning meaningful error messages if something goes wrong.

Finally, the results are streamed back to the client using FastAPI's StreamingResponse, which efficiently delivers the compressed output while setting appropriate headers to prompt the client to handle it as a downloadable file.

In [None]:
%%writefile boltz_fastapi.py
import os
import io
import shutil
import asyncio
import tempfile
import traceback
import subprocess
from pathlib import Path
from typing import Optional, Dict, Any
from click.testing import CliRunner
from boltz.main import predict
from fastapi import FastAPI, File, UploadFile, Form, BackgroundTasks
from fastapi.responses import JSONResponse, StreamingResponse

app = FastAPI()

def package_outputs(output_dir: str) -> bytes:
    import io
    import tarfile

    tar_buffer = io.BytesIO()
    parent_dir = Path(output_dir).parent

    cur_dir = os.getcwd()
    with tarfile.open(fileobj=tar_buffer, mode="w:gz") as tar:
        os.chdir(parent_dir)
        try: 
            tar.add(Path(output_dir).name, arcname=Path(output_dir).name)
        finally: 
            os.chdir(cur_dir)

    return tar_buffer.getvalue()

async def generate_response(process, out_dir, yaml_path):
    try:
        while True:
            try:
                stdout, stderr = await asyncio.wait_for(process.communicate(), timeout=10.0)
                break
            except TimeoutError:
                yield b""  # Yield null character instead of spaces

        if process.returncode != 0:
            raise Exception(stderr.decode())

        print(stdout.decode())

        # Package the output directory
        tar_data = package_outputs(f"{out_dir}/boltz_results_{Path(yaml_path).with_suffix('').name}")
        yield tar_data

    except Exception as e:
        traceback.print_exc()
        yield JSONResponse(status_code=500, content={"error": str(e)}).body

@app.post("/predict/")
async def predict_endpoint(
    yaml_file: UploadFile = File(...),
    msa_file: Optional[UploadFile] = File(None),
    options: Optional[Dict[str, str]] = Form(None)
):
    yaml_path = f"/tmp/{yaml_file.filename}"
    with open(yaml_path, "wb") as buffer:
        shutil.copyfileobj(yaml_file.file, buffer)

    msa_path = f"/tmp/{msa_file.filename}"
    with open(msa_path, "wb") as buffer:
        shutil.copyfileobj(msa_file.file, buffer)

    # Create a temporary directory for the output
    with tempfile.TemporaryDirectory() as out_dir:
        # Call boltz.predict as a CLI tool
        try:
            print(f"Running predictions with options: {options} into directory: {out_dir}")
            # Convert options dictionary to key-value pairs
            options_list = [f"--{key}={value}" for key, value in (options or {}).items()]
            if msa_file and os.path.exists(msa_path):
                print(f"MSA file included at {msa_path}")
            else:
                options_list.append("--use_msa_server")
            command = ["boltz", "predict", yaml_path, "--out_dir", out_dir, "--cache", "/tmp/.boltz_cache"] + options_list
            print(f"Running command: {' '.join(command)}")
            process = await asyncio.create_subprocess_exec(
                *command,
                stdout=asyncio.subprocess.PIPE,
                stderr=asyncio.subprocess.PIPE
            )
            return StreamingResponse(generate_response(process, out_dir, yaml_path), media_type="application/gzip", headers={"Content-Disposition": f"attachment; filename=boltz_results.tar.gz"})

        except Exception as e:
            traceback.print_exc()
            return JSONResponse(status_code=500, content={"error": str(e)})


### Serving

Here we define our Union serving configuration, which specifies how our Boltz FastAPI service will be deployed and managed in your cluster.

First, we define our model artifact by creating a reference to the Boltz model that we materialized previously. This artifact definition includes important metadata such as the project, domain, name, and version, as well as partition information that describes the model's characteristics.

Following that, we define our FastAPI application deployment. The `App` class encapsulates all the specifications needed to run our service in a containerized environment:

- We give our application a name (`boltz-fastapi-notebook`) for identification within the Union system.
- We specify the same ImageSpec we've used throughout.
- We define an input via the Artifact's `query()` method, downloading it and mounting it at the specified path (`/tmp/.boltz_cache`).
- We establish resource limits for our application, including CPU, memory, GPU, and storage requirements.
- We set the port our application will listen on and specify the above file to include in the deployment.
- We define the command line arguments needed to start our FastAPI server using Uvicorn.
- We configure environment variables, such as enabling PyTorch MPS fallback for better compatibility.
- We set up auto-scaling parameters, including the minimum and maximum number of replicas, the scale-down timing, and the metric that triggers scaling (in this case, request rate).
- Finally, we specify the GPU accelerator type we need, which is an NVIDIA L40S in this implementation.

After configuring our application, we prepare for deployment by creating an `AppRemote` instance using the same `remote` object we've been using.

In the final step, we deploy our Boltz FastAPI application to the Union platform by calling the `deploy` method on our `AppRemote` instance. This initiates the deployment process, which will provision the necessary infrastructure, deploy our container, and make our Boltz service available according to the specifications we've defined. This will create a publicly accessible URL that leverages the same auth and RBAC as the rest of your cluster.

This deployment approach allows our Boltz service to automatically scale based on demand, efficiently utilize GPU resources when needed, and maintain high availability with minimum replicas always running.

In [None]:
boltz_model = remote.get_artifact(cache_exec.outputs["artifact"].model_uri)
boltz_model

In [None]:
from datetime import timedelta

from union import Resources, ImageSpec
from union.app import App, ScalingMetric, Input
from flytekit.extras.accelerators import GPUAccelerator

boltz_fastapi = App(
    name="boltz-fastapi-notebook",
    container_image=image,
    inputs=[
        Input(
            name="boltz_model", value=boltz_model.query(), download=True, mount="/tmp/.boltz_cache"
        ),
    ],
    limits=Resources(cpu="2", mem="10Gi", gpu="1", ephemeral_storage="50Gi"),
    port=8080,
    include=["./boltz_fastapi.py"],
    args=["uvicorn", "boltz_fastapi:app", "--host", "0.0.0.0", "--port", "8080"],
    env={
        "PYTORCH_ENABLE_MPS_FALLBACK": "1",
    },
    min_replicas=1,
    max_replicas=3,
    scaledown_after=timedelta(minutes=10),
    scaling_metric=ScalingMetric.RequestRate(1),
    accelerator=GPUAccelerator("nvidia-l40s"),
)

from union.remote._app_remote import AppRemote

app_remote = AppRemote(default_project="default", default_domain="development", union_remote=remote)

app_remote.deploy(boltz_fastapi)

### Trying It Out

Your app should now be fully provisioned and available at an automatically generated endpoint. Since we're using FastAPI, we get swagger docs for free. Head on over to that endpoint and add `/docs/` to the end to pull up the `/predict/` endpoint specification. You can then try it out by passing in `prot.yaml` and `seq.a3m`. Once the prediction has run, you'll have the option to download a tarfile containing the results. Unarchiving it and looking in the predictions folder, you'll find a `.cif` file. This Crystallographic Information File contains the predicted structure of the below sequence. This can then be uploaded to [Molstar](https://molstar.org/viewer/) to view and interact with the structure.

## Final Thoughts

We've covered a lot in this compact example. We've captured the model itself in a convenient Artifact so that we can reference it across workflows and view important metadata. We ran a more traditional workflow in an accelerated way via Actors. We also stood up a persistent app for serving predictions in a flexible and cost effective way using Union Serving. 

All of this was accomplished programatically via UnionRemote from a Jupyter Notebook! There are many directions to go from here, however this represents an approachable and efficient way of prototyping fairly complex use cases.