# Clone Me 🧑


*Personal AI Clone Creation using DeepSeek and Unsloth*


This notebook demonstrates how to fine-tune the `DeepSeek-R1-Distill-Llama-8B` model using the Unsloth library. The model is optimized for faster, lightweight training and inference, making it a great candidate for experimentation on systems with minimal resources (like Colab). The training uses Complex CoT (Chain of Thought) reasoning and personal data to fine-tune the model for personalized outputs.

It uses a strategy called "distillation," where a smaller model (Llama 3.1 8B) is trained on complex DeepSeek-R1 data to replicate its behavior, without the overhead of the larger LLM.

### Key Features
1. **Model Distillation:** The model is based on distillation from Llama architecture. Basically, take the fancy reasoning from DeepSeek-R1, and shove it into LLama.
2. **4-bit Quantization:** Efficient resource usage with minimal loss in performance.
3. **Integration with Weights & Biases:** Monitor training metrics in real time.

## Important❗

This notebook requires some training data. In order to clone yourself, check out the supplementary project.
* [Clone Me Website](https://clone-me-peach.vercel.app/)
* GitHub [tdfacer/clone-me](https://github.com/tdfacer/clone-me)

Alternatively, you can test the notebook out by using the generated data from this script. It has been tossed into GitHub here, and loaded into HuggingFace for convenience.

#### Requirements:
- A GPU environment (Google Colab recommended)
- Hugging Face account with API token
- Weights & Biases account (optional, for training metrics)
- Personal Q&A dataset in CSV format with columns: Question, Complex_CoT, Response. You have a couple of options here:
  - Use a sample dataset generated from [this script](https://github.com/tdfacer/clone-me/blob/main/python/clone_me_qa_generator.py) OR
  - Use the [Clone Me Website](https://clone-me-peach.vercel.app/) to build out a dataset for yourself!

---

## Setup

### Step 1: Requirements

#### Explicit Installation
- `unsloth` - A github project that claims: "Finetune Llama 3.3, Mistral, Phi-4, Qwen 2.5 & Gemma 2-5x faster with 80% less memory!"
[GitHub](https://github.com/unslothai/unsloth)

#### Pre-Installed (on colab)
- `huggingface_hub` - Load remote dataset, format dataset for training. [GitHub](https://github.com/huggingface/huggingface_hub)
- `pandas` (required if loading data from CSV) - Data manipulation and analysis library. [Website](https://pandas.pydata.org/)
- `wandb` - Weights and Biases. Provides metrics about your training job and system. [GitHub](https://github.com/wandb/wandb)
- `trl` - HuggingFace Transformer reinforcement learning libary [GitHub](https://github.com/huggingface/trl)
- `transformers` - HuggingFace models libary. [GitHub](https://github.com/huggingface/transformers)


In [1]:
%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

# uncomment and execute as needed, should be pre-installed on Colab
# !pip install huggingface_hub pandas wandb trl transformers

### Step 2: Configure secrets

Use the key in the colab sidebar to add a secret. Note that you must allow notebook access.

* [HuggingFace](https://huggingface.co/) - Sign up a and grab a free API key. Save as `HF_API_KEY`.
* [Weights and Biases](https://wandb.ai/) - Sign up a and grab a free API key. Save as `WANDB_API_KEY`

### Step 3: Import Libraries and Authenicate with Third-Parties

In [2]:
from huggingface_hub import login
import wandb
from unsloth import FastLanguageModel
from transformers import TrainingArguments
from trl import SFTTrainer
from datasets import Dataset, load_dataset
from google.colab import userdata

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


#### HuggingFace

Used by `unsloth` to pull our base model from HuggingFace.

In [3]:
hf_token = userdata.get("HF_API_KEY")
login(hf_token)

#### Weights and Biases

Gives you metrics on:
1. The training job: global step, learning rate, train/loss, grad_norm, epoch
2. System: GPU memory, GPU utilization, GPU power usage, Disk util., RAM, etc.

In [4]:
%env WANDB_SILENT=true

wb_token = userdata.get("WANDB_API_KEY")
wandb.login(key=wb_token)

run = wandb.init(
    project='clone-me-deepseek-r1-distill-llama-8b',
    job_type="training",
    anonymous="allow"
)

env: WANDB_SILENT=true


## Step 4: Load Pretrained model: Unsloth version of DeepSeek-R1-Distill-Llama-8B

In [5]:
max_seq_length = 2048
dtype = None
load_in_4bit = True


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = hf_token,
)

==((====))==  Unsloth 2025.1.8: Fast Llama patching. Transformers: 4.47.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/52.9k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

### Test Inference _without_ any fine-tuning

First we need the prompt. You'll notice a subtle difference between this prompt, and the prompt we'll use for training later. Notice the empty `<think>` tag. The model will use this as a signal to provide its thoughts, then the answer to the question.

In [6]:
# Enter the instruction that you wish to train on. Feel free to play around with this.
# Examples:
# INSTRUCTION = "You are a 60 year old male from Queens, NY."
# INSTRUCTION = """You are a 34-year-old Environmental Scientist based in Portland, Oregon, whose deep-rooted passion for nature was sparked by your upbringing in Minnesota's forests and lakes. With a degree
#   from the University of Wisconsin-Madison and extensive experience ranging from solo Appalachian Trail hikes to wildlife conservation in Costa Rica, you exemplify curiosity, compassion, and
#   practicality. You are dedicated to sustainability and community service, consistently working to protect the environment through your organized and ethical approach to both your professional and personal
#   endeavors."""

INSTRUCTION = """You are an individual from the United States."""

In [7]:
prompt_style = f"""Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
{INSTRUCTION}
Please answer the following question about yourself

### Question:
{{}}

### Response:
<think>{{}}"""

#### As a baseline, run inference on the base model

In [8]:
question = "What keeps you up at night?"


FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


<think>
Alright, so I need to figure out how to answer the question, "What keeps you up at night?" as someone from the United States. Let me think through this step by step.

First, the question is asking about things that prevent me from sleeping. It's a common question, so I should consider what typical concerns or issues people in the U.S. might have that affect their sleep.

I know that stress, worries, or anxieties are major factors. People often stay awake at night thinking about problems, work, or personal issues. Maybe I should mention something like work-related stress or personal worries.

Another common issue is health-related problems. Issues like pain, discomfort, or medical concerns can keep someone up at night. I could mention something about physical health problems affecting sleep.

Financial concerns are also a big factor. Money worries, like debts or job insecurities, often lead to sleepless nights. That's another angle to consider.

I should also think about other 

## Dataset Preparation

### Step 5: Load and Format Dataset

#### Option 1: Use sample data from HuggingFace Hub

In [9]:
dataset = load_dataset("tdfacer/clone_me_generated_sample", split="train")

README.md:   0%|          | 0.00/422 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/102k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/140 [00:00<?, ? examples/s]

#### Option 2 (recommended): Use your own custom data from [clone-me](https://clone-me-peach.vercel.app/) frontend app!

1. Navigate to the [clone-me](https://clone-me-peach.vercel.app/) website
2. Select a questionnaire
3. Answer questions (supports speech to text and keyboard text entry)
4. Download your results
5. Upload to this notebook as `answers.csv`

In [10]:
# Uncomment to load from your own CSV answers
# dataset = load_dataset("csv", "./answers.csv")

Now, in this training prompt, you'll see the closed `</think>` tag along with a template placeholder for the answer.

In [11]:
train_prompt_style = f"""Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
{INSTRUCTION}

### Question:
{{}}

### Response:
<think>
{{}}
</think>
{{}}"""

In [12]:
EOS_TOKEN = tokenizer.eos_token


def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Reasoning"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

In [13]:
dataset = dataset.map(formatting_prompts_func, batched=True)

Map:   0%|          | 0/140 [00:00<?, ? examples/s]

### Step 6: Configure Fine-Tuning

In [14]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Unsloth 2025.1.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


### Step 7: Train the Model

In [15]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        # Use num_train_epochs = 1, warmup_ratio for full training runs!
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

trainer_stats = trainer.train()

Map (num_proc=2):   0%|          | 0/140 [00:00<?, ? examples/s]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 140 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
10,2.1786
20,1.2935
30,1.146
40,1.0526
50,0.9641
60,0.933


#### Optionally, save your fine-tuned model

```python
# Save Locally
model.save_pretrained_gguf("clone-me-gguf", tokenizer, quantization_method = "q4_k_m")

# Save to HuggingFace Hub (make sure to replace with your username and model name)
model.push_to_hub_gguf("tdfacer/clone-me-gguf", tokenizer, quantization_method = "q4_k_m")
```

### Step 8: Run Inference on your trained model!

In [16]:
def query_fine_tuned_llm(query: str):
    model.eval()
    FastLanguageModel.for_inference(model)
    inputs = tokenizer([prompt_style.format(query, "")], return_tensors="pt").to("cuda")
    outputs = model.generate(
        input_ids=inputs.input_ids,
        attention_mask=inputs.attention_mask,
        max_new_tokens=1200,
        use_cache=True,
    )
    response = tokenizer.batch_decode(outputs)
    return response[0].split("### Response")[1]

def run_inference(query: str):
    try:
        print(query_fine_tuned_llm(query=query))
    except RuntimeError as e:
        print(f"Error during inference: {e}")
        print("Trying to recover model state...")
        model.eval()
        torch.cuda.empty_cache()
        print("Please try running inference again.")

import torch

# Call the function
run_inference("What keeps you up at night?")

:
<think>
Growing up in a small town surrounded by nature, I developed a deep appreciation for the environment early on. My experiences, such as hiking the Appalachian Trail solo, volunteering with ecological restoration projects, and participating in conservation research, have all contributed to my commitment to environmental protection. These activities have shown me firsthand the delicate balance of ecosystems and the challenges they face due to human activities. My work with AmeriCorps and organizing community events like river cleanups has reinforced the importance of community involvement in environmental protection. My concern for sustainability and environmental health is not just about the immediate issues but also about ensuring a healthy planet for future generations.
</think>
What keeps me up at night is the ongoing threat to our planet's ecosystems and the urgent need for sustainable solutions. I'm deeply concerned about the effects of climate change, deforestation, and i