# Workflow to train adapter for Apple on-device foundation model

While the pre-trained model is powerful, fine-tuning allows enhancing its performance on specific tasks and domains.

This notebook covers all steps required to train and prepare your adapters for deployment, each step calls into the corresponding example script. Please note that these scripts are for educational purposes and should be tailored to your production environment.

## Step 0: Setup
Before running this notebook, ensure you have the environment setup by running the commands below in bash.

Create a new environment via Conda.
```bash
conda create -n env-adapter-training python=3.11
conda activate env-adapter-training
```

Install dependencies.
```bash
pip install -r requirements.txt
```

From the bundle's root directory, launch the notebook.
```bash
jupyter notebook examples/end_to_end_example.ipynb
```

Ensure Python searches the current working directory for modules so the example scripts can be found and check that all depedencies are installed.

In [None]:
from pathlib import Path
import sys
import os
import logging
sys.path.append(os.path.abspath(""))
sys.path.append(os.path.abspath(".."))

if not logging.getLogger().hasHandlers():
    logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

# Check all dependent packages are installed
import importlib.util
dependent_packages = ["coremltools", "torch", "sentencepiece", "tamm", "examples", "export"]
missing_packages = []
for package_name in dependent_packages:
    spec = importlib.util.find_spec(package_name)
    if spec is None:
        print(f"❌ Package '{package_name}' is NOT installed or cannot be found.")
        missing_packages.append(package_name)

# Check assets
dependent_assets = {
    "weights_template.bin", "checkpoint_spec.yaml", "draft.mil",
    "base-model-config.json", "base-model.pt", "draft-model-config.json", "draft-model.pt",
    "tokenizer-config.json", "tokenizer.model",
}
missing_assets = dependent_assets - set([path.name for path in Path("../assets").rglob("*")])
for asset_name in missing_assets:
    print(f"❌ Asset '{asset_name}' cannot be found.")

if not missing_packages and not missing_assets:
    print("✅ All set")

## Step 1: Play script writing experiment

To demonstrate the end-to-end workflow, this notebook walks through training an adapter to produce specifically formatted play written scenes. 

Let's say you are an app developer hoping to develop an app for generating short playwriting scripts for kids.
 
The dataset consists of examples of formatted theatrical scripts, which is used to encode into the model’s knowledge base. 

Before training, let’s first inspect the output produced by the model.

In [None]:
from examples.generate import generate_content, GenerationConfiguration
from examples.messages import Message

output = generate_content(
    [[
        Message.from_system("A conversation between a user and a helpful assistant. Taking the role as a play writer assistant for a kids' play."),
        Message.from_user("Write a script about penguins.")
    ]],
    GenerationConfiguration(temperature=0.0, max_new_tokens=128)
)

output[0].response

In [None]:
from IPython.display import Markdown, display
display(Markdown(output[0].response))

While the model understands the general concept of play script writing, it may not consistently implement the specific conventions that you would like it to follow without more explicit guidance. 

Perhaps you would like to use syntax such as ``<center> </center>`` and  ``<dialog></dialog>`` for the model to generate a script where the act and scene headings are centered, character's names are centered and capitalized and within the dialogue tags, etc. You also want the outputs to be more concise, since you hope to bring these generated outputs onto your kid's playwriting app.

For instance, you may want the model to generate a specific format as such:
```jsonl
<center>Act One</center>\n\n<center>Scene 1</center>\n\n<stage>A moonlit forest clearing. Night time. An ancient oak tree dominates the center.</stage>\n\n<center>PROFESSOR FINCH</center>\n\n<dialog>I've been searching these woods for hours. The rare night owl must be somewhere!</dialog>...
```

## Step 2: Train

If you have limited data or need a quick baseline, prompt engineering or few-shot prompts are low-cost and straightforward. However, if you have at least dozens of labeled examples and need the model to excel in a specific domain or format, training may be more suitable. Supervised fine-tuning is ideal when:

* You need the model to become a subject-matter expert or adhere to a specific style or format consistently.
* Prompt engineering isn’t achieving the required accuracy or consistency for your task.
* You want lower latency at inference. A fine-tuned model can use minimal prompting, whereas prompt-engineered solutions might require lengthy instructions or examples every time.

On the other hand, if your task involves rapidly changing data or open-ended knowledge, fine-tuning alone won’t be effective. In such cases, techniques like Retrieval-Augmented Generation (RAG) or tool calling may be more suitable.

### Step 2a. Prepare your data

Before you can begin training, you need a dataset to guide the model. For better results, the dataset should contain high-quality, diverse, and representative examples of real inputs and outputs.

The dataset should be stored in a .jsonl (JSON Lines) file format. Each line in the file represents a single sample consisting of a prompt-response pair. You can include a system prompt to provide task-specific instructions to the model, adding additional shared context, for example:

```json
[{“role”: “system”, “content”: “<INSTRUCTION>”}, {“role”: “user”, “content”: “<USER PROMPT>”}, {“role”: “assistant”, “content”: “<RESPONSE>”}]
```

The default system message is added if omitted from the input:
```json
[{“role”: “user”, “content”: “<USER PROMPT>”}, {“role”: “assistant”, “content”: “<RESPONSE>”}]
```

**Note:** Each sample can also be encapsulated in a dictionary, for example:
```json
{"messages": [{“role”: “user”, “content”: “<USER PROMPT>”}, {“role”: “assistant”, “content”: “<RESPONSE>”}]}
```

It’s also recommended to split the data into training and validation sets. The training set is used to update the model parameters, while the validation set is used to monitor performance and identify any overfitting.

To train model to produce properly formatted theatrical scripts, we provide it with a set of examples consisting of a custom system prompt, a user message describing a theme for the play, and the expected response, as illustrated below.

```jsonl
[{"role": "system", "content": "You are a helpful playwriting assistant."}, {"role": "user", "content": "generate a script about The Wise Owl"}, {"role": "assistant", "content": "<center>Act One</center>\n\n<center>Scene 1</center>\n\n<stage>A moonlit forest clearing. Night time. An ancient oak tree dominates the center.</stage>\n\n<center>PROFESSOR FINCH</center>\n\n<dialog>I've been searching these woods for hours. The rare night owl must be somewhere!</dialog>..."}]
```

To learn more about how to prepare your adapter training data for the guided generation via the Foundation Models framework, refer to the [schema specification document](../docs/schema.md).

### Step 2b. Train the adapter

Training (or fine-tuning) large models can be resource-intensive if all parameters need to be updated. To address this, the on-device foundation model employs a parameter-efficient fine-tuning (PEFT) approach known as LoRA (Low-Rank Adaptation). In LoRA, the original model weights are frozen, and small trainable weight matrices called “adapters” are trained for a set of layers. During training, only these adapter weights are updated, significantly reducing the number of parameters to train, enabling faster and less memory-intensive fine-tuning, while also allowing the base model to be shared across multiple tasks.

Checkout [Apple Intelligence Foundation Language Models](https://machinelearning.apple.com/research/apple-intelligence-foundation-language-models) to learn more about the on-device model and adapter architecture.

In the following code snippet, we call the [train_adapter](train_adapter.py#338) method, passing in the path to the training set (train_data), optionally the evaluation data (eval_data), training configuration which includes the number of full training passes over the entire training set (epochs), and a floating-point number indicating the algorithm’s learning rate (learning_rate).

In [None]:
from examples.train_adapter import AdapterTrainingConfiguration, train_adapter

train_adapter(
    train_data="toy_dataset/playwriting_train.jsonl",
    eval_data="toy_dataset/playwriting_valid.jsonl",
    config=AdapterTrainingConfiguration(epochs=2, learning_rate=1e-4),
    checkpoint_dir="./"
)

### Step 2c. Train the draft model to improve inference speed (Optional)

Speculative decoding is a technique used to improve the efficiency of large language models (LLMs) during inference. This is achieved by making educated guesses about upcoming tokens resulting in faster responses without sacrificing output quality.

At its core, speculative decoding employs a smaller, faster "draft" model to generate preliminary token predictions. This draft model quickly suggests a sequence of potential tokens, which are then verified and refined by the primary, more accurate LLM. This process leverages the draft model's quick inference capabilities to reduce the overall decoding time, as the larger model primarily focuses on validating or correcting the speculated tokens to ensure the smaller model produces the same outputs. For more details on the optimization, please refer to the linked papers, [Leviathan et al., 2022](https://arxiv.org/abs/2211.17192) and [Chen et al., 2023](https://arxiv.org/abs/2302.01318).

To optimize the draft model's performance, the draft model can be trained to mimic the output of the larger model, and is what we'll do next using the checkpoint trained earlier.

In [None]:
from examples.train_draft_model import DraftModelTrainingConfiguration, train_draft_model

train_draft_model(
    checkpoint="adapter-final.pt",
    train_data="toy_dataset/playwriting_train.jsonl",
    eval_data="toy_dataset/playwriting_valid.jsonl",
    config=DraftModelTrainingConfiguration(epochs=2, learning_rate=1e-4),
    checkpoint_dir="./",
    checkpoint_frequency=1,
)

## Step 3: Evaluate

Training is only valuable if it improves results. Therefore, it’s important to evaluate the model using quantitative and qualitative metrics. 

Listed below are some commonly used metrics but what metrics you use is highly dependent on your task and any discussion on evaluation is out of scope for this notebook.

Quantitative (metrics)

* Accuracy: Ideal for classification-style tasks.
* BLEU / ROUGE: Ideal for text generation.

Qualitative (human or AI in-the-loop)

* Review 10-20 random outputs to ensure accuracy, conciseness, and proper formatting.
* Seek feedback from domain experts.
* Evaluate using a larger model.

For this example, we’ll simply run inference using the initial example to verify that the trained adapter has acquired knowledge about the way of playwriting as I wished.

In [None]:
from examples.generate import generate_content, GenerationConfiguration
from examples.messages import Message

output = generate_content(
    [[
        Message.from_system("A conversation between a user and a helpful assistant. Taking the role as a play writer assistant for a kids' play."),
        Message.from_user("Write a script about penguins.")
    ]],
    GenerationConfiguration(temperature=0.0, max_new_tokens=128),
    checkpoint="adapter-final.pt",
    draft_checkpoint="draft-model-final.pt"
)

output[0].response

In [None]:
display(Markdown(output[0].response))

Nice!! Seems to have learned from my custom data.

## Step 4: Export

Once you have a trained artifact, you can export to a fmadapter . for on-device deployment via Foundation Models framework. To export, you pass in the checkpoint of the trained adapter, optionally the draft model, as well as the output directory where the fmadapter will be saved to.

In [None]:
from export.export_fmadapter import Metadata, export_fmadapter

metadata = Metadata(
    author="3P developer",
    description="An adapter that writes play scripts.",
)

export_fmadapter(
    output_dir="./",
    adapter_name="myPlaywritingAdapter",
    metadata=metadata,
    checkpoint="adapter-final.pt",
    draft_checkpoint="draft-model-final.pt",
)

## Step 5: Produce asset pack (optional)

Optionally, if you prefer not to bundle `.fmadapter` in your app, you can use Managed Background Assets to distribute your adapter assets.  

Managed Background Assets offers a set of new OS features that download, manage, and update asset packs on users’ devices. It works with both Apple hosting and third-party hosting. To find out details about supported platforms, download policies, installation event types, and more, check out the documentation page at: https://developer.apple.com/documentation/backgroundassets 

Use the code snippet below to produce an asset pack from your Python environment. In order to run the code, you need to install Xcode 17.0 on your Mac, where the required asset packaging tool is included in the Xcode toolchain.

In [None]:
from export.produce_asset_pack import AssetPackBuilder, produce_asset_pack

produce_asset_pack(
    fmadapter_path="myPlaywritingAdapter.fmadapter",
    output_path="myPlaywritingAdapter.aar",
    platforms=[AssetPackBuilder.Platforms.iOS, AssetPackBuilder.Platforms.macOS],
    download_policy=AssetPackBuilder.DownloadPolicy.PREFETCH,
    installation_event_type=AssetPackBuilder.InstallationEventType.FIRST_INSTALLTION
)