In [14]:
! pip3 install transformers

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




In [15]:
import torch
from transformers import pipeline


In [16]:
device = "mps" if torch.backends.mps.is_available() else ("cuda:0" if torch.cuda.is_available() else "cpu")
dtype = torch.float16 if device == "mps" else torch.float32

In [17]:
ask_llm = pipeline(
  task="text-generation",
  model="Qwen/Qwen2.5-3B-Instruct",
  device=device,
  torch_dtype=dtype
)

print(ask_llm("Who is Scott Lai?")[0]["generated_text"])

Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00,  2.80s/it]
Device set to use mps


Who is Scott Lai? He is a Chinese-American actor, singer, and dancer who was born on January 29, 1995, in Los Angeles, California. He began his acting career at the age of six, starring in Disney Channel's "The Suite Life of Zack & Cody" and later in the animated series "Kim Possible." In addition to acting, Scott has also been involved in singing and dancing, performing in various music videos and commercials.

Scott Lai has been featured in several popular TV shows and movies, including "The Flash," "Riverdale," "Gossip Girl," and "The Secret Life of the American Teenager." He has also released several singles and EPs as a solo artist under various record labels.

In recent years, Scott has gained popularity for his performances in films such as "The Last Witch Hunter" (2015), "Crimson Peak" (2015), and "Kingsman: The Secret Service" (2014). He has also appeared in the TV series "Black Lightning" and "Younger."

Scott Lai continues to be active in the entertainment industry, both as 

As you can see here, the model has no idea who I am from above response.

Let's cook it!

First, let's teach the model who I am. Here you can use your personal data to generate the exact format you will use for fine-turning base on your own data. You can use ChatGPT for this, just ask it to transfer your resume into the trainable json format with "prompt" and "completion"

In [18]:
# load data 
from datasets import load_dataset

raw_data = load_dataset('json', data_files = "scott_lai_resume_train.json")
raw_data

DatasetDict({
    train: Dataset({
        features: ['prompt', 'completion'],
        num_rows: 122
    })
})

In [19]:
raw_data["train"][0]

{'prompt': 'What is Scott Lai’s profession?',
 'completion': 'AI Engineer and Data Scientist.'}

As you can see, here we return with the long text, but for fine-tuning we need the data to be small and precise chunks, more like here we apply the tokenization to take the text and split it into smaller chunks. Each chunk is called a token and it the smallest unit of meaning that LLMs work with.

In [20]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct"
)
def preprocess(sample):
    sample = sample['prompt']+ '\n' + sample['completion']
    print(sample)
    tokenized = tokenizer(
        sample,
        max_length = 128,
        truncation = True,
        padding = "max_length"    
    )

    tokenized['labels'] = tokenized['input_ids'].copy()
    return tokenized
data = raw_data.map(preprocess)


Map: 100%|██████████| 122/122 [00:00<00:00, 6332.04 examples/s]

What is Scott Lai’s profession?
AI Engineer and Data Scientist.
How many years of experience does Scott Lai have in generative AI and LLM solutions?
Over 5 years.
What infrastructures is Scott Lai skilled in designing and optimizing?
Scalable ML infrastructures using PyTorch, Hugging Face, and FastAPI on AWS.
What type of workflows is Scott Lai experienced in building?
End-to-end pipelines, scalable microservices, and ETL workflows.
What collaboration experience does Scott Lai have?
Proven track record in cross-functional collaboration and implementing ML and data engineering best practices.
Which skill in Programming & Scripting does Scott Lai have?
Python
Which skill in Programming & Scripting does Scott Lai have?
Rust
Which skill in Programming & Scripting does Scott Lai have?
Node.js
Which skill in Programming & Scripting does Scott Lai have?
HTML
Which skill in Programming & Scripting does Scott Lai have?
CSS
Which skill in Programming & Scripting does Scott Lai have?
JavaScript
W




In [21]:
print(data['train'])

Dataset({
    features: ['prompt', 'completion', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 122
})


## LoRA

now, let's move into the training

In [22]:
from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM
import torch

In [23]:
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct",
    device_map = device,
    torch_dtype = torch.float16
)

lora_config = LoraConfig (
    
    task_type = TaskType.CAUSAL_LM, 
    target_modules=['q_proj', "k_proj", "v_proj"]
)
model = get_peft_model(model, lora_config)

Loading checkpoint shards: 100%|██████████| 2/2 [00:03<00:00,  1.73s/it]


In [24]:
from transformers import TrainingArguments, Trainer


train_args = TrainingArguments(
    num_train_epochs = 10, # we will go throught the dataset from start to finish 10 times
    learning_rate=0.001, 
    logging_steps = 25, # we want to see the result in every 25 steps it runs 
    fp16 = False # float point set to 16 to speed it up, set to "True" if you are on GPU
)

trainer = Trainer(
    args = train_args,
    model = model, 
    train_dataset=data["train"]
)

In [25]:
trainer.train()



Step,Training Loss
25,2.352
50,0.3704
75,0.2344
100,0.1919
125,0.1627
150,0.1335


TrainOutput(global_step=160, training_loss=0.5460465855896472, metrics={'train_runtime': 463.458, 'train_samples_per_second': 2.632, 'train_steps_per_second': 0.345, 'total_flos': 2602200748523520.0, 'train_loss': 0.5460465855896472, 'epoch': 10.0})

In [29]:
# save the model
trainer.save_model("./my-qwen")
tokenizer.save_pretrained("./my-qwen")

('./my-qwen/tokenizer_config.json',
 './my-qwen/special_tokens_map.json',
 './my-qwen/chat_template.jinja',
 './my-qwen/vocab.json',
 './my-qwen/merges.txt',
 './my-qwen/added_tokens.json',
 './my-qwen/tokenizer.json')

Now let's test it out

In [34]:
ask_llm = pipeline(
  task="text-generation",
  model="./my-qwen",
  tokenizer='./my-qwen',
  device=device,
  torch_dtype=dtype
)

print(ask_llm("Who is Scott Lai?")[0]["generated_text"])

Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  2.28it/s]
Device set to use mps


Who is Scott Lai? A Data Engineer based in Los Angeles, CA.
