<a href="https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/SFT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://withpi.ai"><img src="https://play.withpi.ai/logo/logoFullBlack.svg" width="240"></a>

<a href="https://code.withpi.ai"><font size="4">Documentation</font></a>

<a href="https://play.withpi.ai"><font size="4">Technique Catalog</font></a>

# Supervised Fine-tuning (SFT) with Standard Gradient Descent

This is the companion to the SFT playground

Description: Train models to more deeply learn patterns from your data.

## Install and initialize SDK

Connect to a regular CPU Python 3 runtime.  You won't need GPUs for this notebook.

You'll need a WITHPI_API_KEY from https://play.withpi.ai.  Add it to your notebook secrets (the key symbol) on the left.

Run the cell below to install packages and load the SDK

In [1]:
%%capture

%pip install withpi withpi-utils datasets tqdm litellm

import os
from google.colab import userdata
from withpi import PiClient

# Load the notebook secret into the environment so the Pi Client can access it.
os.environ["WITHPI_API_KEY"] = userdata.get('WITHPI_API_KEY')

client = PiClient()

# Load a scoring spec and dataset

We have a pre-existing scoring spec and a dataset you can play with.


In [2]:
# @title Load Scoring Spec
from withpi_utils.colab import load_scoring_spec_from_web, display_scoring_spec

tldr_scoring_spec = load_scoring_spec_from_web(
    "https://raw.githubusercontent.com/withpi/cookbook-withpi/refs/heads/main/scoring_specs/tldr.json"
)

display_scoring_spec(tldr_scoring_spec)

In [3]:
# @title Load dataset
from datasets import load_dataset

tldr_dataset = load_dataset("withpi/tldr", split="train").select(range(200))

print(tldr_dataset)

README.md:   0%|          | 0.00/319 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/47.4M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/50000 [00:00<?, ? examples/s]

Dataset({
    features: ['prompt', 'completion'],
    num_rows: 200
})


## Kick off the job

The SFT job internally performs a 90/10 train-test split, which is why the loader is not splitting the input data.

This process takes a while, please be patient as a cloud GPU is aquired, fine tuning is performed, and a result is returned.

⏰ **Note:** This cell can take 30+ minutes to run.

In [4]:
status = client.training.sft.start_job(
    scoring_spec=tldr_scoring_spec,
    examples=[
        {"llm_input": row["prompt"], "llm_output": row["completion"]}
        for row in tldr_dataset
    ],
    base_sft_model="LLAMA_3.2_3B",
    lora_config={"lora_rank": "R_16"},
    system_prompt=tldr_scoring_spec.description,
    num_train_epochs=5,
)
print(status)

SftStatus(detailed_status=['LAUNCHING'], job_id='sft_jobs:09ac227c912792130a876568ea872593308c0d4b3d7c896ec7991f041cbeedd8:c8f73b43-c883-410d-a177-4b4ae6305516', state='QUEUED', trained_models=[])


## Monitor for completion

Now run the following cell to see how training is progressing.

In [21]:
from withpi_utils.colab import stream_training_response

response = stream_training_response(
    status.job_id,
    client.training.sft,
    additional_columns={"Pi_Score": "contract_score"},
)

print("Result training state: {}".format(response.state))
if response.state == "ERROR":
    print("The job failed due to:\n{}".format("\n".join(response.detailed_status[-5:])))
elif response.state == "DONE":
    print(
        "GRPO model = {}".format(response.trained_models[0].model_dump_json(indent=2))
    )

Training Status for sft_jobs:09ac227c912792130a876568ea872593308c0d4b3d7c896ec7991f041cbeedd8:c8f73b43-c883-410d-a177-4b4ae6305516


Unnamed: 0,Step,Epoch,Learning_Rate,Training_Loss,Eval_Loss,Pi_Score
0,0,0.0,X,X,2.635514,0.645159
1,5,0.444444,0.000192,2.8762,X,X
2,10,0.888889,0.000173,2.2354,1.851034,0.551857
3,15,1.266667,0.000154,2.0202,X,X
4,20,1.711111,0.000135,1.962,1.765495,0.633598
5,25,2.088889,0.000115,1.8028,X,X
6,30,2.533333,0.000096,1.7243,1.757175,0.605529
7,35,2.977778,0.000077,1.7791,X,X
8,40,3.355556,0.000058,1.6148,1.781069,0.698514
9,45,3.8,0.000038,1.6068,X,X


Result training state: DONE
GRPO model = {
  "contract_score": 0.6985144353297448,
  "epoch": 3.3555555555555556,
  "eval_loss": 1.7810693979263306,
  "serving_id": 0,
  "serving_state": "SERVING",
  "step": 40
}


# Query the model!

Models are hosted on [Fireworks](https://fireworks.ai) accessible with your API key through your favorite frontend.  Try the following:

In [None]:
# @title Generate response using the SFT trained model as well.
import litellm
import time

SFT_JOB_ID = "sft_jobs:09ac227c912792130a876568ea872593308c0d4b3d7c896ec7991f041cbeedd8:c8f73b43-c883-410d-a177-4b4ae6305516"

# Load the SFT Model
client.training.sft.load(SFT_JOB_ID)

# Wait for the model to be loaded
while not (
    client.training.sft.retrieve(SFT_JOB_ID).trained_models[0].serving_state
    == "SERVING"
):
    time.sleep(3)

prompt = """SUBREDDIT: r/relationships TITLE: I (f/22) have to figure out if I want to still know these girls or not and would hate to sound insulting POST: Not sure if this belongs here but it's worth a try. Backstory: When I (f/22) went through my first real breakup 2 years ago because he needed space after a year of dating roand it effected me more than I thought. It was a horrible time in my life due to living with my mother and finally having the chance to cut her out of my life. I can admit because of it was an emotional wreck and this guy was stable and didn't know how to deal with me. We ended by him avoiding for a month or so after going to a festival with my friends. When I think back I wish he just ended. So after he ended it added my depression I suffered but my friends helped me through it and I got rid of everything from him along with cutting contact. Now: Its been almost 3 years now and I've gotten better after counselling and mild anti depressants. My mother has been out of my life since then so there's been alot of progress. Being stronger after learning some lessons there been more insight about that time of my life but when I see him or a picture everything comes back. The emotions and memories bring me back down. His friends (both girls) are on my facebook because we get along well which is hard to find and I know they'll always have his back. But seeing him in a picture or talking to him at a convention having a conversation is tough. Crying confront of my current boyfriend is something I want to avoid. So I've been thinking that I have to cut contact with these girls because it's time to move on because it's healthier. It's best to avoid him as well. But will they be insulted? Will they accept it? Is there going to be awkwardness? I'm not sure if it's the right to do and could use some outside opinions. TL;DR:"""

response = litellm.completion(
    messages=[
        {"role": "system", "content": tldr_scoring_spec.description},
        {"role": "user", "content": prompt},
    ],
    model="fireworks_ai/0",
    api_base=f"https://api.withpi.ai/v1/training/sft/{SFT_JOB_ID}",
    api_key=os.environ["WITHPI_API_KEY"],
    max_tokens=2048,
)

print("Raw Completion response:\n")
print(response.choices[0].message.content)

Raw Completion response:

I want to cut contact with these girls because seeing him in pictures brings back past emotions. Is it unkind to do so? Would they accept it or be offended?


In [None]:
# @title Manually inspect the score
from withpi_utils.colab import pretty_print_responses

response_text = response.choices[0].message.content

score = client.scoring_system.score(
    llm_input=prompt, llm_output=response_text, scoring_spec=tldr_scoring_spec
)

pretty_print_responses(
    response1=response_text,
    header="##### " + prompt,
    left_label="SFT Trained Model",
    scores_left=score,
)

0,1,2
Length,,1.0
,Length Compliance,1.0
Structure,,0.752
,Length Compliance,0.965
,Conciseness,0.711
,No Redundancy,0.785
,No Repetition,0.668
,No Incomplete Sentences,0.629
Content Accuracy,,0.754
,Important Points,0.508
