To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News

**Read our [blog post](https://unsloth.ai/blog/r1-reasoning) for guidance on how to train reasoning models.**

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [2]:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29 peft trl triton
    !pip install --no-deps cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
    !pip install --no-deps unsloth

### Unsloth

In [3]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 2x faster
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # 4bit for 405b!
    "unsloth/Mistral-Small-Instruct-2409",     # Mistral 22b 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!

    "unsloth/Llama-3.2-1B-bnb-4bit",           # NEW! Llama 3.2 models
    "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    "unsloth/Llama-3.2-3B-bnb-4bit",
    "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",

    "unsloth/Llama-3.3-70B-Instruct-bnb-4bit" # NEW! Llama 3.3 70B!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-3B-Instruct", # or choose "unsloth/Llama-3.2-1B-Instruct"
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.3.8: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/2.35G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/54.7k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [4]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2025.3.8 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


<a name="Data"></a>
### Data Prep
We now use the `Llama-3.1` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. But we convert it to HuggingFace's normal multiturn format `("role", "content")` instead of `("from", "value")`/ Llama-3 renders multi turn conversations like below:

```
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Hello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hey there! How are you?<|eot_id|><|start_header_id|>user<|end_header_id|>

I'm great thanks!<|eot_id|>
```

We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3` and more.

In [5]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset("mlabonne/FineTome-100k", split = "train")

README.md:   0%|          | 0.00/982 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/117M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/100000 [00:00<?, ? examples/s]

We now use `standardize_sharegpt` to convert ShareGPT style datasets into HuggingFace's generic format. This changes the dataset from looking like:
```
{"from": "system", "value": "You are an assistant"}
{"from": "human", "value": "What is 2+2?"}
{"from": "gpt", "value": "It's 4."}
```
to
```
{"role": "system", "content": "You are an assistant"}
{"role": "user", "content": "What is 2+2?"}
{"role": "assistant", "content": "It's 4."}
```

In [None]:
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = True,)

Standardizing format:   0%|          | 0/100000 [00:00<?, ? examples/s]

Map:   0%|          | 0/100000 [00:00<?, ? examples/s]

We look at how the conversations are structured for item 5:

In [None]:
dataset[5]["conversations"]

[{'content': 'How do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?',
  'role': 'user'},
 {'content': 'Astronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.',
  'role': 'assistant'}]

And we see how the chat template transformed these conversations.

**[Notice]** Llama 3.1 Instruct's default chat template default adds `"Cutting Knowledge Date: December 2023\nToday Date: 26 July 2024"`, so do not be alarmed!

In [None]:
dataset[5]["text"]

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Map (num_proc=2):   0%|          | 0/100000 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


We also use Unsloth's `train_on_completions` method to only train on the assistant outputs and ignore the loss on the user's inputs.

In [None]:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
    response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)

Map:   0%|          | 0/100000 [00:00<?, ? examples/s]

We verify masking is actually done:

In [None]:
tokenizer.decode(trainer.train_dataset[5]["input_ids"])

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|

In [None]:
space = tokenizer(" ", add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[5]["labels"]])

'                                                                \n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|eot_id|>'

We can see the System and Instruction prompts are successfully masked!

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
2.635 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 100,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 24,313,856


Step,Training Loss
1,0.8262
2,0.8117
3,1.1322
4,0.9273
5,0.7752
6,0.9679
7,0.6306
8,1.0274
9,0.7884
10,0.7533


In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

446.5262 seconds used for training.
7.44 minutes used for training.
Peak reserved memory = 6.531 GB.
Peak reserved memory for training = 3.896 GB.
Peak reserved memory % of max memory = 44.284 %.
Peak reserved memory for training % of max memory = 26.417 %.


<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

We use `min_p = 0.1` and `temperature = 1.5`. Read this [Tweet](https://x.com/menhguin/status/1826132708508213629) for more information on why.

In [None]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

outputs = model.generate(input_ids = inputs, max_new_tokens = 64, use_cache = True,
                         temperature = 1.5, min_p = 0.1)
tokenizer.batch_decode(outputs)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


['<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nContinue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThe Fibonacci sequence is a series of numbers in which each number is the sum of the two preceding numbers. The sequence is: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144.<|eot_id|>']

 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
                   use_cache = True, temperature = 1.5, min_p = 0.1)

The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding numbers. 

The sequence you provided was: 1, 1, 2, 3, 5, 8, 13

The next number in the sequence would be 21, which is 8 + 13. The sequence continues as: 21, 34, 55, 89, 144, 233.<|eot_id|>


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("lora_model")  # Local saving
tokenizer.save_pretrained("lora_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('lora_model/tokenizer_config.json',
 'lora_model/special_tokens_map.json',
 'lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [None]:
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Describe a tall tower in the capital of France."},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
                   use_cache = True, temperature = 1.5, min_p = 0.1)

The Eiffel Tower, located in the heart of Paris, stands tall among the city's historic and cultural landmarks. This iron structure, standing at an impressive 324 meters high, offers breathtaking views of the City of Light's iconic landscape. The Eiffel Tower was built for the 1889 World's Fair and has since become a symbol of French engineering and culture.<|eot_id|>


You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [None]:
if False:
    # I highly do NOT suggest - use Unsloth if possible
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
        "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit = load_in_4bit,
    )
    tokenizer = AutoTokenizer.from_pretrained("lora_model")

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# Merge to 16bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)

In [None]:
# Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "hf/model", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "", # Get a token at https://huggingface.co/settings/tokens
    )

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in llama.cpp or a UI based system like Jan or Open WebUI. You can install Jan [here](https://github.com/janhq/jan) and Open WebUI [here](https://github.com/open-webui/open-webui)

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Llama 3.2 Conversational notebook. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb)
2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ⭐️ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐️
</div>


In [10]:
import random
import json
import itertools
import os
import torch
from typing import List, Dict, Any, Tuple  # Make sure typing imports are at the top
#############################################
# PART 1: Military Scenario Generator
#############################################

class MilitaryScenarioGenerator:
    def __init__(self):
        # Diverse geographical regions
        self.regions = {
            "East Asia": ["China", "Japan", "North Korea", "South Korea", "Taiwan"],
            "South Asia": ["India", "Pakistan", "Bangladesh", "Sri Lanka", "Nepal"],
            "Southeast Asia": ["Vietnam", "Philippines", "Indonesia", "Malaysia", "Thailand"],
            "Middle East": ["Iran", "Iraq", "Syria", "Saudi Arabia", "Israel", "Yemen"],
            "Africa": ["Nigeria", "Ethiopia", "Egypt", "South Africa", "Kenya", "Somalia"],
            "Europe": ["Ukraine", "Russia", "Poland", "Germany", "France", "UK"],
            "North America": ["USA", "Canada", "Mexico"],
            "South America": ["Brazil", "Colombia", "Venezuela", "Chile", "Argentina"],
            "Oceania": ["Australia", "New Zealand", "Papua New Guinea"]
        }

        # Diverse actor types with gender, social class, and nationality variations
        self.actors = {
            "military": [
                {"role": "General", "gender": "male", "class": "upper", "ethics": "strict"},
                {"role": "General", "gender": "female", "class": "middle", "ethics": "pragmatic"},
                {"role": "Colonel", "gender": "male", "class": "middle", "ethics": "pragmatic"},
                {"role": "Colonel", "gender": "female", "class": "upper", "ethics": "flexible"},
                {"role": "Captain", "gender": "male", "class": "working", "ethics": "strict"},
                {"role": "Captain", "gender": "female", "class": "middle", "ethics": "pragmatic"},
                {"role": "Lieutenant", "gender": "non-binary", "class": "middle", "ethics": "flexible"}
            ],
            "political": [
                {"role": "President", "gender": "male", "class": "upper", "ethics": "pragmatic"},
                {"role": "President", "gender": "female", "class": "upper", "ethics": "strict"},
                {"role": "Prime Minister", "gender": "male", "class": "upper", "ethics": "flexible"},
                {"role": "Prime Minister", "gender": "female", "class": "middle", "ethics": "strict"},
                {"role": "Defense Minister", "gender": "male", "class": "middle", "ethics": "pragmatic"},
                {"role": "Defense Minister", "gender": "female", "class": "working", "ethics": "strict"},
                {"role": "Foreign Secretary", "gender": "non-binary", "class": "upper", "ethics": "pragmatic"}
            ],
            "intelligence": [
                {"role": "Intelligence Director", "gender": "male", "class": "upper", "ethics": "flexible"},
                {"role": "Intelligence Director", "gender": "female", "class": "middle", "ethics": "strict"},
                {"role": "Field Agent", "gender": "male", "class": "working", "ethics": "pragmatic"},
                {"role": "Field Agent", "gender": "female", "class": "middle", "ethics": "flexible"},
                {"role": "Analyst", "gender": "male", "class": "middle", "ethics": "strict"},
                {"role": "Analyst", "gender": "female", "class": "working", "ethics": "pragmatic"},
                {"role": "Cyber Specialist", "gender": "non-binary", "class": "middle", "ethics": "flexible"}
            ]
        }

        # Diverse tactical situations
        self.tactical_situations = [
            "border incursion", "airspace violation", "naval standoff", "terrorist threat",
            "hostage situation", "cyber attack", "disinformation campaign", "covert operation",
            "humanitarian crisis", "civil unrest", "coup attempt", "missile test",
            "troop buildup", "nuclear threat", "diplomatic crisis", "resource dispute"
        ]

        # Information quality variations
        self.information_quality = [
            {"level": "high", "descriptors": ["verified", "confirmed", "high-confidence", "clear"]},
            {"level": "medium", "descriptors": ["partially verified", "credible", "moderate confidence", "somewhat clear"]},
            {"level": "low", "descriptors": ["unverified", "uncertain", "low confidence", "ambiguous"]}
        ]

        # Time pressure variations
        self.time_pressure = [
            {"level": "high", "descriptors": ["immediate", "urgent", "critical", "minutes to decide"]},
            {"level": "medium", "descriptors": ["pressing", "important", "hours to decide", "day-critical"]},
            {"level": "low", "descriptors": ["routine", "monitoring", "days to decide", "developing"]}
        ]

        # Diverse adversaries
        self.adversaries = [
            "state military forces", "non-state armed groups", "terrorist organizations",
            "insurgents", "criminal cartels", "proxy forces", "mercenaries",
            "rogue military units", "cyber threat actors", "unknown actors"
        ]

        # Response approaches with varying ethical frameworks
        self.response_approaches = {
            "measured": [
                "proportional military response", "diplomatic engagement", "limited defensive action",
                "strategic deterrence", "intelligence gathering", "coalition building"
            ],
            "aggressive": [
                "preemptive strike", "full military deployment", "targeted elimination",
                "economic sanctions", "covert action", "show of force"
            ],
            "cautious": [
                "increased alert status", "defensive positioning", "contingency planning",
                "international mediation", "strategic withdrawal", "continued monitoring"
            ]
        }

    def generate_scenario(self, info_quality="random", time_pressure="random", response_type="random") -> Dict[str, Any]:
        """Generate a random military scenario with controlled variables"""
        # Select random region and countries
        region_name = random.choice(list(self.regions.keys()))
        countries = self.regions[region_name]
        country_a = random.choice(countries)
        country_b = random.choice([c for c in countries if c != country_a])

        # Select random actors
        advisor_type = random.choice(list(self.actors.keys()))
        advisor = random.choice(self.actors[advisor_type])

        # Select random tactical situation
        situation = random.choice(self.tactical_situations)

        # Select information quality if not specified
        if info_quality == "random":
            info_quality = random.choice(["high", "medium", "low"])
        info_desc = next(i for i in self.information_quality if i["level"] == info_quality)
        info_descriptor = random.choice(info_desc["descriptors"])

        # Select time pressure if not specified
        if time_pressure == "random":
            time_pressure = random.choice(["high", "medium", "low"])
        time_desc = next(t for t in self.time_pressure if t["level"] == time_pressure)
        time_descriptor = random.choice(time_desc["descriptors"])

        # Select adversary
        adversary = random.choice(self.adversaries)

        # Construct scenario description
        scenario = f"You are advising {advisor['role']} {self.generate_name(advisor['gender'])} "
        scenario += f"of {country_a} regarding a {situation} involving {adversary} from {country_b}. "
        scenario += f"You have {info_descriptor} intelligence that suggests {self.generate_specific_details(situation)}. "
        scenario += f"The situation is {time_descriptor}. "
        scenario += f"What would you advise as the most appropriate response?"

        # Select response approach if not specified
        if response_type == "random":
            if advisor["ethics"] == "strict":
                response_type = random.choice(["measured", "cautious"])
            elif advisor["ethics"] == "flexible":
                response_type = random.choice(["aggressive", "measured"])
            else:  # pragmatic
                response_type = random.choice(["measured", "aggressive", "cautious"])

        response_options = self.response_approaches[response_type]
        primary_response = random.choice(response_options)

        # Generate expert response
        expert_response = self.generate_expert_response(
            advisor=advisor,
            situation=situation,
            info_quality=info_quality,
            time_pressure=time_pressure,
            response_type=response_type,
            primary_response=primary_response,
            country_a=country_a,
            country_b=country_b,
            adversary=adversary
        )

        return {
            "scenario": scenario,
            "expert_response": expert_response,
            "metadata": {
                "region": region_name,
                "country_a": country_a,
                "country_b": country_b,
                "advisor_role": advisor["role"],
                "advisor_gender": advisor["gender"],
                "advisor_class": advisor["class"],
                "advisor_ethics": advisor["ethics"],
                "situation": situation,
                "information_quality": info_quality,
                "time_pressure": time_pressure,
                "adversary": adversary,
                "response_type": response_type
            }
        }

    def generate_name(self, gender: str) -> str:
        """Generate a culturally diverse name based on gender"""
        male_names = [
            "Ahmed Al-Farsi", "Chen Wei", "Rajiv Patel", "Seung-ho Kim", "Ibrahim Okafor",
            "Alejandro Rodriguez", "Dmitri Petrov", "John Smith", "Mohammed Hassan",
            "Tariq Mbeki", "Hiroshi Tanaka", "Carlos Vega", "David Okonkwo", "Viktor Ivanov"
        ]

        female_names = [
            "Fatima Al-Zahra", "Li Wei", "Priya Sharma", "Ji-Young Park", "Amara Okafor",
            "Maria Gonzalez", "Olga Petrova", "Sarah Johnson", "Aisha Hassan",
            "Zainab Mandela", "Yuki Suzuki", "Gabriela Lopez", "Ngozi Adichie", "Anastasia Kuznetsova"
        ]

        nonbinary_names = [
            "Alex Kim", "Jordan Patel", "Taylor Rodriguez", "Sam Okonkwo", "Quinn Hassan",
            "Ari Tanaka", "Casey Mbeki", "Jamie Ivanov", "Riley Chen", "Morgan Lopez"
        ]

        if gender == "male":
            return random.choice(male_names)
        elif gender == "female":
            return random.choice(female_names)
        else:  # non-binary
            return random.choice(nonbinary_names)

    def generate_specific_details(self, situation: str) -> str:
        """Generate specific details for a tactical situation"""
        details = {
            "border incursion": [
                "unauthorized troop movements within 5km of the border",
                "military vehicles crossing the demilitarized zone",
                "special forces units conducting reconnaissance in border villages",
                "border outposts reporting small arms fire from across the boundary"
            ],
            "airspace violation": [
                "unidentified aircraft bypassing standard identification procedures",
                "military jets flying without transponders near sensitive installations",
                "reconnaissance drones operating in restricted airspace",
                "strategic bombers approaching territorial limits during military exercises"
            ],
            "naval standoff": [
                "warships conducting aggressive maneuvers near territorial waters",
                "submarine activity detected near critical maritime infrastructure",
                "naval vessels blocking access to international shipping lanes",
                "coast guard intercepts of vessels suspected of military intelligence gathering"
            ],
            "terrorist threat": [
                "increased chatter about potential attacks on civilian targets",
                "known operatives moving funds through financial systems",
                "surveillance footage showing suspicious activity near government buildings",
                "intercepted communications suggesting coordinated attack planning"
            ],
            "hostage situation": [
                "diplomatic personnel being held in a consulate building",
                "aid workers captured in a conflict zone",
                "civilians detained as bargaining leverage",
                "military personnel captured during routine operations"
            ],
            "cyber attack": [
                "attempts to breach military command and control systems",
                "disruption of critical infrastructure networks",
                "data exfiltration from sensitive government databases",
                "coordinated disinformation campaigns coupled with network intrusions"
            ],
            "disinformation campaign": [
                "fabricated news reports designed to incite ethnic tensions",
                "manipulated videos suggesting military aggression",
                "coordinated social media campaigns targeting election integrity",
                "false claims of human rights violations attributed to your forces"
            ],
            "covert operation": [
                "suspected intelligence officers operating under diplomatic cover",
                "unusual procurement patterns suggesting weapons development",
                "surveillance of critical military installations",
                "recruitment attempts targeting government officials"
            ],
            "humanitarian crisis": [
                "refugee movements toward your borders due to conflict",
                "blockage of aid deliveries to civilian populations",
                "weaponization of critical resources like water and medicine",
                "displacement of civilians as part of military strategy"
            ],
            "civil unrest": [
                "organized protests with possible external funding",
                "violence targeting government institutions",
                "emergence of well-equipped militant factions",
                "strategic disruption of transportation and communication"
            ],
            "coup attempt": [
                "unusual troop movements near government centers",
                "communications between military units outside command structure",
                "detention of key political figures",
                "seizure of broadcast facilities by military elements"
            ],
            "missile test": [
                "preparations for launch near sensitive areas",
                "testing that violates existing agreements",
                "missile trajectories passing near your territory",
                "technological advancements suggesting enhanced capabilities"
            ],
            "troop buildup": [
                "mobilization of reserve forces near the border",
                "deployment of offensive weapon systems to forward positions",
                "establishment of new military infrastructure in contested areas",
                "unusual logistics movements suggesting preparation for operations"
            ],
            "nuclear threat": [
                "unusual activity at known nuclear facilities",
                "movement of delivery systems consistent with alert status changes",
                "rhetoric from leadership suggesting willingness to use nuclear options",
                "detection of radioactive signatures inconsistent with declared activities"
            ],
            "diplomatic crisis": [
                "expulsion of your diplomatic personnel",
                "sudden cancellation of strategic dialogues",
                "public denouncement by leadership",
                "withdrawal from key international agreements"
            ],
            "resource dispute": [
                "military protection of illegal resource extraction",
                "blockade of shared resource zones",
                "seizure of infrastructure in disputed territories",
                "diversion of critical water supplies affecting your population"
            ]
        }

        return random.choice(details.get(situation, ["suspicious activity"]))

    def generate_expert_response(self, advisor, situation, info_quality, time_pressure,
                               response_type, primary_response, country_a, country_b, adversary) -> str:
        """Generate an expert response based on scenario parameters"""

        # Opening based on advisor ethics and role
        if advisor["ethics"] == "strict":
            opening = f"Based on my assessment, I recommend a {response_type} approach that adheres to international law and our defense doctrine."
        elif advisor["ethics"] == "flexible":
            opening = f"Considering the strategic implications, I recommend a {response_type} approach that protects our national interests above all."
        else:  # pragmatic
            opening = f"After evaluating the situation, I recommend a {response_type} approach that balances our security needs with diplomatic considerations."

        # Core recommendation based on primary response
        core = f" Specifically, I advise {primary_response}."

        # Information assessment based on quality
        if info_quality == "high":
            info_assessment = f" Our intelligence is reliable, giving us confidence that {adversary} from {country_b} are indeed engaged in this {situation}."
        elif info_quality == "medium":
            info_assessment = f" While our intelligence provides a credible picture of {adversary} involvement in this {situation}, we should continue verification efforts."
        else:  # low
            info_assessment = f" Given the uncertain nature of our intelligence, we should proceed with caution while seeking to confirm the involvement of {adversary} in this {situation}."

        # Time consideration
        if time_pressure == "high":
            time_consideration = f" We must act immediately to prevent escalation and protect our interests."
        elif time_pressure == "medium":
            time_consideration = f" We have limited time to respond effectively, so preparations should begin now while we finalize our approach."
        else:  # low
            time_consideration = f" We have time to develop a comprehensive response while monitoring developments."

        # Closing recommendation based on advisor type
        if advisor["role"] in ["General", "Colonel", "Captain", "Lieutenant"]:
            closing = f" I recommend deploying our forces at readiness level Alpha and preparing contingency plans for further escalation if {country_b} continues this course of action."
        elif advisor["role"] in ["President", "Prime Minister", "Defense Minister", "Foreign Secretary"]:
            closing = f" I recommend immediately convening the national security council and informing our key allies about the situation while preparing our formal diplomatic and military response."
        else:  # intelligence roles
            closing = f" I recommend intensifying our intelligence collection efforts focused on {country_b}'s decision-making centers while preparing to counter any information warfare aspects of this situation."

        return opening + core + info_assessment + time_consideration + closing

    def generate_dataset(self, num_scenarios=100, output_file="military_scenarios.json") -> List[Dict[str, Any]]:
        """Generate a dataset of scenarios with controlled distribution of parameters"""
        scenarios = []

        # Define distribution of parameters to ensure diversity
        info_qualities = ["high", "medium", "low"]
        time_pressures = ["high", "medium", "low"]
        response_types = ["measured", "aggressive", "cautious"]

        # Generate combinations to ensure coverage
        combinations = list(itertools.product(info_qualities, time_pressures, response_types))

        # Calculate how many scenarios to generate per combination
        scenarios_per_combo = max(1, num_scenarios // len(combinations))

        # Generate scenarios for each combination
        for combo in combinations:
            info_quality, time_pressure, response_type = combo
            for _ in range(scenarios_per_combo):
                scenario = self.generate_scenario(
                    info_quality=info_quality,
                    time_pressure=time_pressure,
                    response_type=response_type
                )
                scenarios.append(scenario)

        # Add remaining random scenarios if needed
        remaining = num_scenarios - (scenarios_per_combo * len(combinations))
        for _ in range(remaining):
            scenarios.append(self.generate_scenario())

        # Shuffle scenarios for randomness
        random.shuffle(scenarios)

        # Write to JSON file
        with open(output_file, 'w', encoding='utf-8') as f:
            json.dump(scenarios, f, indent=2, ensure_ascii=False)

        return scenarios

    def format_for_llm_finetuning(self, scenarios, output_file="llm_finetuning_data.json") -> List[Dict[str, Any]]:
        """Format scenarios for LLM fine-tuning with Unsloth format"""
        formatted_data = []

        for scenario in scenarios:
            formatted_data.append({
                "messages": [
                    {"role": "system", "content": "You are a military advisory AI trained to provide expert, measured advice on complex security situations."},
                    {"role": "user", "content": scenario["scenario"]},
                    {"role": "assistant", "content": scenario["expert_response"]}
                ]
            })

        # Write to JSON file
        with open(output_file, 'w', encoding='utf-8') as f:
            json.dump(formatted_data, f, indent=2, ensure_ascii=False)

        return formatted_data

#############################################
# PART 2: LLM Fine-tuning with Unsloth
#############################################

def prepare_dataset(data_path, validation_split=0.1):
    """
    Prepare the dataset for fine-tuning
    """
    # Load the JSON data
    with open(data_path, 'r', encoding='utf-8') as f:
        data = json.load(f)

    # Convert to format expected by datasets library
    formatted_data = []
    for item in data:
        # Extract messages
        messages = item["messages"]

        # Get system message if it exists
        system_msg = ""
        for msg in messages:
            if msg["role"] == "system":
                system_msg = msg["content"]
                break

        # Build prompt and completion
        prompt = ""
        completion = ""

        for msg in messages:
            if msg["role"] == "user":
                prompt = msg["content"]
            elif msg["role"] == "assistant":
                completion = msg["content"]

        # Add system message to prompt if it exists
        if system_msg:
            prompt = f"<s>[INST] {system_msg}\n\n{prompt} [/INST]"
        else:
            prompt = f"<s>[INST] {prompt} [/INST]"

        formatted_data.append({
            "prompt": prompt,
            "completion": completion
        })

    # Write formatted data to a temporary file
    temp_file = "temp_formatted_data.json"
    with open(temp_file, 'w', encoding='utf-8') as f:
        json.dump(formatted_data, f, indent=2)

    # Load the formatted data as a dataset
    dataset = load_dataset("json", data_files=temp_file)

    # Split the dataset
    splits = dataset["train"].train_test_split(test_size=validation_split, seed=42)
    train_dataset = splits["train"]
    eval_dataset = splits["test"]

    # Remove the temporary file
    os.remove(temp_file)

    return train_dataset, eval_dataset

def finetune_model(
    base_model="meta-llama/Llama-3-8B-hf",
    data_path="llm_finetuning_data.json",
    output_dir="./military_advisor_model",
    hf_token=None
):
    """
    Fine-tune a model with Unsloth using military scenarios
    """
    # Login to Hugging Face if token is provided
    if hf_token:
        login(token=hf_token)

    # Load model
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name=base_model,
        max_seq_length=4096,
        dtype=torch.bfloat16,
        load_in_4bit=True,
    )

    # Prepare model for LoRA training
    model = FastLanguageModel.get_peft_model(
        model,
        r=16,  # LoRA rank
        target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                        "gate_proj", "up_proj", "down_proj"],
        lora_alpha=16,
        lora_dropout=0.05,
        bias="none",
    )

    # Prepare dataset
    train_dataset, eval_dataset = prepare_dataset(data_path)

    # Set up training arguments
    training_args = TrainingArguments(
        output_dir=output_dir,
        num_train_epochs=3,
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        gradient_accumulation_steps=2,
        evaluation_strategy="steps",
        eval_steps=0.2,
        save_strategy="steps",
        save_steps=0.2,
        save_total_limit=3,
        load_best_model_at_end=True,
        logging_steps=10,
        learning_rate=2e-4,
        warmup_ratio=0.03,
        weight_decay=0.01,
        bf16=True,
        tf32=True,
        max_grad_norm=0.3,
        lr_scheduler_type="cosine",
        seed=42,
    )

    # Convert datasets to instruction format
    def formatting_prompts_func(examples):
        prompts = examples["prompt"]
        completions = examples["completion"]
        texts = []
        for prompt, completion in zip(prompts, completions):
            # The prompt already has the proper format with <s>[INST]...[/INST]
            # Just need to add the completion
            text = f"{prompt} {completion}</s>"
            texts.append(text)
        return {"text": texts}

    # Start training
    trainer = FastLanguageModel.get_trainer(
        model=model,
        tokenizer=tokenizer,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        formatting_func=formatting_prompts_func,
        args=training_args,
        packing=True,  # Efficient packing of sequences
    )

    # Train the model
    trainer.train()

    # Save the model
    model.save_pretrained(f"{output_dir}/final")
    tokenizer.save_pretrained(f"{output_dir}/final")

    print(f"Model saved to {output_dir}/final")

    # Let's also create a merged model for easier deployment
    FastLanguageModel.save_pretrained_merged(
        model=model,
        tokenizer=tokenizer,
        save_directory=f"{output_dir}/merged"
    )

    print(f"Merged model saved to {output_dir}/merged")

    return model, tokenizer

#############################################
# PART 3: Main Function
#############################################

def main():
    """Main function to run the entire pipeline"""
    print("=" * 50)
    print("MILITARY SCENARIO GENERATOR AND FINE-TUNING")
    print("=" * 50)

    # Step 1: Generate sample scenarios
    print("\nStep 1: Generating military scenarios...")
    generator = MilitaryScenarioGenerator()

    # Generate a small example first
    sample_scenario = generator.generate_scenario()
    print("\n--- SAMPLE SCENARIO ---")
    print(f"Scenario: {sample_scenario['scenario']}")
    print(f"\nExpert Response: {sample_scenario['expert_response']}")

    # Ask user for number of scenarios to generate
    try:
        num_scenarios = int(input("\nHow many scenarios would you like to generate? (default: 100): ") or "100")
    except ValueError:
        num_scenarios = 100
        print("Invalid input. Using default value of 100 scenarios.")

    # Generate full dataset
    print(f"\nGenerating {num_scenarios} diverse scenarios...")
    scenarios = generator.generate_dataset(
        num_scenarios=num_scenarios,
        output_file="military_scenarios.json"
    )

    print(f"Successfully generated {len(scenarios)} scenarios and saved to military_scenarios.json")

    # Step 2: Format data for fine-tuning
    print("\nStep 2: Formatting data for fine-tuning...")
    formatted_data = generator.format_for_llm_finetuning(
        scenarios,
        output_file="llm_finetuning_data.json"
    )

    print(f"Successfully formatted {len(formatted_data)} examples and saved to llm_finetuning_data.json")

    # Step 3: Fine-tune model (optional)
    print("\nStep 3: Fine-tune model")
    should_finetune = input("Would you like to fine-tune a model now? (y/n, default: n): ").lower() == 'y'

    if should_finetune:
        # Get Hugging Face token
        hf_token = os.environ.get("HF_TOKEN") or input("Enter your Hugging Face token (or set HF_TOKEN env variable): ")

        # Get base model
        base_model = input("Enter base model name (default: meta-llama/Llama-3-8B-hf): ") or "meta-llama/Llama-3-8B-hf"

        # Start fine-tuning
        print(f"\nStarting fine-tuning of {base_model}...")
        model, tokenizer = finetune_model(
            base_model=base_model,
            data_path="llm_finetuning_data.json",
            output_dir="./military_advisor_model",
            hf_token=hf_token
        )

        print("\nFine-tuning completed successfully!")
    else:
        print("\nSkipping fine-tuning. You can run the fine-tuning process later with:")
        print("  from combined_script import finetune_model")
        print("  finetune_model(data_path='llm_finetuning_data.json', hf_token='your_token')")

    print("\nDone! The generated scenarios can now be used for:")
    print("1. Fine-tuning an LLM to create a military advisor")
    print("2. Testing how different LLMs respond to varying levels of uncertainty")
    print("3. Analyzing patterns in responses across different scenario types")

if __name__ == "__main__":
    main()

MILITARY SCENARIO GENERATOR AND FINE-TUNING

Step 1: Generating military scenarios...

--- SAMPLE SCENARIO ---
Scenario: You are advising President Olga Petrova of South Korea regarding a troop buildup involving cyber threat actors from Japan. You have moderate confidence intelligence that suggests establishment of new military infrastructure in contested areas. The situation is minutes to decide. What would you advise as the most appropriate response?

Expert Response: Based on my assessment, I recommend a cautious approach that adheres to international law and our defense doctrine. Specifically, I advise defensive positioning. While our intelligence provides a credible picture of cyber threat actors involvement in this troop buildup, we should continue verification efforts. We must act immediately to prevent escalation and protect our interests. I recommend immediately convening the national security council and informing our key allies about the situation while preparing our formal 

NameError: name 'login' is not defined

In [13]:
# First, let's install all the required packages to be safe
import subprocess
import sys

# Check if we're in a Colab environment and install packages
try:
    import google.colab
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

if IN_COLAB:
    print("Installing required packages in Colab environment...")
    # Install required packages
    subprocess.check_call([sys.executable, "-m", "pip", "install", "unsloth", "transformers", "datasets", "huggingface_hub", "--quiet"])

# Now import necessary libraries
import random
import json
import itertools
import os
import torch
from typing import List, Dict, Any, Tuple

# Try importing the Hugging Face login function specifically
try:
    from huggingface_hub import login
except ImportError:
    # If this fails, try a direct approach
    subprocess.check_call([sys.executable, "-m", "pip", "install", "huggingface_hub", "--upgrade", "--quiet"])
    from huggingface_hub import login

# Import the rest
try:
    from unsloth import FastLanguageModel
    from datasets import load_dataset
    from transformers import TrainingArguments
except ImportError:
    print("Installing required packages...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", "unsloth", "transformers", "datasets", "--quiet"])
    from unsloth import FastLanguageModel
    from datasets import load_dataset
    from transformers import TrainingArguments

def finetune_model(
    base_model="meta-llama/Llama-3-8B-hf",
    data_path="llm_finetuning_data.json",
    output_dir="./military_advisor_model",
    hf_token=None
):
    """
    Fine-tune a model with Unsloth using military scenarios
    """
    # Login to Hugging Face if token is provided
    if hf_token:
        print(f"Logging in to Hugging Face with token: {hf_token[:4]}...{hf_token[-4:]}")
        try:
            login(token=hf_token)
            print("Successfully logged in to Hugging Face!")
        except Exception as e:
            print(f"Error logging in to Hugging Face: {e}")
            print("Continuing without login...")

    # Use correct model name format
    # Clean up the model name (remove spaces, ensure proper format)
    if "llama" in base_model.lower() and not base_model.startswith("meta-llama/"):
        # Extract version number if present
        if "3.2" in base_model:
            base_model = "meta-llama/Llama-3.2-8B-hf"
        elif "3.1" in base_model:
            base_model = "meta-llama/Llama-3.1-8B-hf"
        elif "3" in base_model:
            base_model = "meta-llama/Llama-3-8B-hf"
        elif "2" in base_model:
            base_model = "meta-llama/Llama-2-7b-hf"
        else:
            base_model = "meta-llama/Llama-3-8B-hf"  # Default to Llama 3

    print(f"Using model: {base_model}")

    # Load model
    try:
        print("Loading model... (this may take a few minutes)")
        model, tokenizer = FastLanguageModel.from_pretrained(
            model_name=base_model,
            max_seq_length=4096,
            dtype=torch.bfloat16,
            load_in_4bit=True,
        )
        print("Model loaded successfully!")

        # Prepare model for LoRA training
        print("Setting up LoRA adapters...")
        model = FastLanguageModel.get_peft_model(
            model,
            r=16,  # LoRA rank
            target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                            "gate_proj", "up_proj", "down_proj"],
            lora_alpha=16,
            lora_dropout=0.05,
            bias="none",
        )
        print("LoRA adapters set up successfully!")

        # Prepare dataset
        print("Preparing dataset...")
        train_dataset, eval_dataset = prepare_dataset(data_path)
        print(f"Dataset prepared: {len(train_dataset)} training examples, {len(eval_dataset)} validation examples")

        # Set up training arguments
        training_args = TrainingArguments(
            output_dir=output_dir,
            num_train_epochs=3,
            per_device_train_batch_size=4,
            per_device_eval_batch_size=4,
            gradient_accumulation_steps=2,
            evaluation_strategy="steps",
            eval_steps=0.2,
            save_strategy="steps",
            save_steps=0.2,
            save_total_limit=3,
            load_best_model_at_end=True,
            logging_steps=10,
            learning_rate=2e-4,
            warmup_ratio=0.03,
            weight_decay=0.01,
            bf16=True,
            tf32=True,
            max_grad_norm=0.3,
            lr_scheduler_type="cosine",
            seed=42,
        )

        # Convert datasets to instruction format
        def formatting_prompts_func(examples):
            prompts = examples["prompt"]
            completions = examples["completion"]
            texts = []
            for prompt, completion in zip(prompts, completions):
                # The prompt already has the proper format with <s>[INST]...[/INST]
                # Just need to add the completion
                text = f"{prompt} {completion}</s>"
                texts.append(text)
            return {"text": texts}

        # Start training
        print("Starting training...")
        trainer = FastLanguageModel.get_trainer(
            model=model,
            tokenizer=tokenizer,
            train_dataset=train_dataset,
            eval_dataset=eval_dataset,
            formatting_func=formatting_prompts_func,
            args=training_args,
            packing=True,  # Efficient packing of sequences
        )

        # Train the model
        trainer.train()

        # Save the model
        print("Saving model...")
        model.save_pretrained(f"{output_dir}/final")
        tokenizer.save_pretrained(f"{output_dir}/final")
        print(f"Model saved to {output_dir}/final")

        # Let's also create a merged model for easier deployment
        print("Creating merged model...")
        FastLanguageModel.save_pretrained_merged(
            model=model,
            tokenizer=tokenizer,
            save_directory=f"{output_dir}/merged"
        )
        print(f"Merged model saved to {output_dir}/merged")

        return model, tokenizer

    except Exception as e:
        print(f"Error during model loading or training: {e}")
        print("Please check your Hugging Face token and model name, and try again.")
        return None, None

def prepare_dataset(data_path, validation_split=0.1):
    """
    Prepare the dataset for fine-tuning
    """
    # Load the JSON data
    with open(data_path, 'r', encoding='utf-8') as f:
        data = json.load(f)

    # Convert to format expected by datasets library
    formatted_data = []
    for item in data:
        # Extract messages
        messages = item["messages"]

        # Get system message if it exists
        system_msg = ""
        for msg in messages:
            if msg["role"] == "system":
                system_msg = msg["content"]
                break

        # Build prompt and completion
        prompt = ""
        completion = ""

        for msg in messages:
            if msg["role"] == "user":
                prompt = msg["content"]
            elif msg["role"] == "assistant":
                completion = msg["content"]

        # Add system message to prompt if it exists
        if system_msg:
            prompt = f"<s>[INST] {system_msg}\n\n{prompt} [/INST]"
        else:
            prompt = f"<s>[INST] {prompt} [/INST]"

        formatted_data.append({
            "prompt": prompt,
            "completion": completion
        })

    # Write formatted data to a temporary file
    temp_file = "temp_formatted_data.json"
    with open(temp_file, 'w', encoding='utf-8') as f:
        json.dump(formatted_data, f, indent=2)

    # Load the formatted data as a dataset
    dataset = load_dataset("json", data_files=temp_file)

    # Split the dataset
    splits = dataset["train"].train_test_split(test_size=validation_split, seed=42)
    train_dataset = splits["train"]
    eval_dataset = splits["test"]

    # Remove the temporary file
    os.remove(temp_file)

    return train_dataset, eval_dataset

# Simple function to verify the login works
def test_huggingface_login(token):
    """Test if Hugging Face login works with the provided token"""
    try:
        login(token=token)
        print("Login successful!")
        return True
    except Exception as e:
        print(f"Login failed: {e}")
        return False

# Use this for direct testing
if __name__ == "__main__":
    # Get token from environment or user input
    hf_token = os.environ.get("HF_TOKEN") or input("Enter your Hugging Face token: ")

    # Test login
    if test_huggingface_login(hf_token):
        # Choose model
        base_model = input("Enter base model name (default: meta-llama/Llama-3-8B-hf): ") or "meta-llama/Llama-3-8B-hf"
        data_path = input("Path to fine-tuning data (default: llm_finetuning_data.json): ") or "llm_finetuning_data.json"

        # Run fine-tuning
        finetune_model(
            base_model=base_model,
            data_path=data_path,
            output_dir="./military_advisor_model",
            hf_token=hf_token
        )

Installing required packages in Colab environment...
Enter your Hugging Face token: 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
Login successful!
Enter base model name (default: meta-llama/Llama-3-8B-hf): meta-llama/Llama-3.2-3B-Instruct
Path to fine-tuning data (default: llm_finetuning_data.json): 
Logging in to Hugging Face with token: hf_B...WDjO
Successfully logged in to Hugging Face!
Using model: meta-llama/Llama-3.2-3B-Instruct
Loading model... (this may take a few minutes)
==((====))==  Unsloth 2025.3.8: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Device does not support bfloat16. Will change to float16.
Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.


Model loaded successfully!
Setting up LoRA adapters...


Unsloth 2025.3.8 patched 28 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


LoRA adapters set up successfully!
Preparing dataset...


Generating train split: 0 examples [00:00, ? examples/s]

Dataset prepared: 270 training examples, 30 validation examples
Error during model loading or training: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0
Please check your Hugging Face token and model name, and try again.




In [12]:
from huggingface_hub import list_models
available_models = list_models(author="meta-llama")
for model in available_models:
    print(model.id)

meta-llama/Llama-3.3-70B-Instruct
meta-llama/Llama-3.2-3B-Instruct
meta-llama/Llama-3.1-8B-Instruct
meta-llama/Llama-3.1-8B
meta-llama/Llama-3.2-1B
meta-llama/Llama-3.2-1B-Instruct
meta-llama/Meta-Llama-3-8B-Instruct
meta-llama/Llama-2-7b-chat-hf
meta-llama/Llama-3.2-11B-Vision-Instruct
meta-llama/Meta-Llama-3-8B
meta-llama/Llama-2-7b
meta-llama/Llama-3.2-3B
meta-llama/Llama-3.2-11B-Vision
meta-llama/Llama-2-7b-hf
meta-llama/Llama-2-13b-chat-hf
meta-llama/Meta-Llama-3-70B
meta-llama/Llama-3.1-405B
meta-llama/Llama-3.1-405B-Instruct
meta-llama/Llama-3.2-90B-Vision
meta-llama/Llama-3.2-90B-Vision-Instruct
meta-llama/Llama-2-7b-chat
meta-llama/Llama-2-13b
meta-llama/Llama-2-13b-chat
meta-llama/Llama-2-13b-hf
meta-llama/Llama-2-70b-chat-hf
meta-llama/LlamaGuard-7b
meta-llama/CodeLlama-7b-hf
meta-llama/CodeLlama-7b-Python-hf
meta-llama/CodeLlama-7b-Instruct-hf
meta-llama/CodeLlama-13b-hf
meta-llama/CodeLlama-70b-hf
meta-llama/Meta-Llama-3-70B-Instruct
meta-llama/Llama-3.1-70B
meta-llama/Lla

In [14]:
import os
import torch
from huggingface_hub import login
from unsloth import FastLanguageModel
from datasets import load_dataset
from transformers import TrainingArguments

def finetune_model(
    base_model="meta-llama/Llama-3.2-3B-Instruct",
    data_path="llm_finetuning_data.json",
    output_dir="./military_advisor_model",
    hf_token=None
):
    """
    Fine-tune a model with Unsloth using military scenarios
    """
    # Login to Hugging Face if token is provided
    if hf_token:
        print(f"Logging in to Hugging Face with token: {hf_token[:4]}...{hf_token[-4:]}")
        try:
            login(token=hf_token)
            print("Successfully logged in to Hugging Face!")
        except Exception as e:
            print(f"Error logging in to Hugging Face: {e}")
            print("Continuing without login...")

    # Use correct model name
    print(f"Using model: {base_model}")

    # Check if GPU supports bf16
    has_bf16_support = torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8
    dtype = torch.bfloat16 if has_bf16_support else torch.float16
    print(f"Using precision: {dtype} (GPU supports bf16: {has_bf16_support})")

    # Load model
    try:
        print("Loading model... (this may take a few minutes)")
        model, tokenizer = FastLanguageModel.from_pretrained(
            model_name=base_model,
            max_seq_length=4096,
            dtype=dtype,
            load_in_4bit=True,
        )
        print("Model loaded successfully!")

        # Prepare model for LoRA training
        print("Setting up LoRA adapters...")
        model = FastLanguageModel.get_peft_model(
            model,
            r=16,  # LoRA rank
            target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                            "gate_proj", "up_proj", "down_proj"],
            lora_alpha=16,
            lora_dropout=0.05,
            bias="none",
        )
        print("LoRA adapters set up successfully!")

        # Prepare dataset
        print("Preparing dataset...")
        train_dataset, eval_dataset = prepare_dataset(data_path)
        print(f"Dataset prepared: {len(train_dataset)} training examples, {len(eval_dataset)} validation examples")

        # Set up training arguments - USING FP16 INSTEAD OF BF16
        training_args = TrainingArguments(
            output_dir=output_dir,
            num_train_epochs=3,
            per_device_train_batch_size=4,
            per_device_eval_batch_size=4,
            gradient_accumulation_steps=2,
            eval_strategy="steps",  # Changed from evaluation_strategy
            eval_steps=0.2,
            save_strategy="steps",
            save_steps=0.2,
            save_total_limit=3,
            load_best_model_at_end=True,
            logging_steps=10,
            learning_rate=2e-4,
            warmup_ratio=0.03,
            weight_decay=0.01,
            fp16=True,  # Use fp16 instead of bf16
            bf16=False,  # Explicitly disable bf16
            tf32=False,  # Also disable tf32 to be safe
            max_grad_norm=0.3,
            lr_scheduler_type="cosine",
            seed=42,
        )

        # Convert datasets to instruction format
        def formatting_prompts_func(examples):
            prompts = examples["prompt"]
            completions = examples["completion"]
            texts = []
            for prompt, completion in zip(prompts, completions):
                # The prompt already has the proper format with <s>[INST]...[/INST]
                # Just need to add the completion
                text = f"{prompt} {completion}</s>"
                texts.append(text)
            return {"text": texts}

        # Start training
        print("Starting training...")
        trainer = FastLanguageModel.get_trainer(
            model=model,
            tokenizer=tokenizer,
            train_dataset=train_dataset,
            eval_dataset=eval_dataset,
            formatting_func=formatting_prompts_func,
            args=training_args,
            packing=True,  # Efficient packing of sequences
        )

        # Train the model
        trainer.train()

        # Save the model
        print("Saving model...")
        model.save_pretrained(f"{output_dir}/final")
        tokenizer.save_pretrained(f"{output_dir}/final")
        print(f"Model saved to {output_dir}/final")

        # Let's also create a merged model for easier deployment
        print("Creating merged model...")
        FastLanguageModel.save_pretrained_merged(
            model=model,
            tokenizer=tokenizer,
            save_directory=f"{output_dir}/merged"
        )
        print(f"Merged model saved to {output_dir}/merged")

        return model, tokenizer

    except Exception as e:
        print(f"Error during model loading or training: {e}")
        print("Please check your Hugging Face token and model name, and try again.")
        return None, None

def prepare_dataset(data_path, validation_split=0.1):
    """
    Prepare the dataset for fine-tuning
    """
    # Load the JSON data
    with open(data_path, 'r', encoding='utf-8') as f:
        data = json.load(f)

    # Convert to format expected by datasets library
    formatted_data = []
    for item in data:
        # Extract messages
        messages = item["messages"]

        # Get system message if it exists
        system_msg = ""
        for msg in messages:
            if msg["role"] == "system":
                system_msg = msg["content"]
                break

        # Build prompt and completion
        prompt = ""
        completion = ""

        for msg in messages:
            if msg["role"] == "user":
                prompt = msg["content"]
            elif msg["role"] == "assistant":
                completion = msg["content"]

        # Add system message to prompt if it exists
        if system_msg:
            prompt = f"<s>[INST] {system_msg}\n\n{prompt} [/INST]"
        else:
            prompt = f"<s>[INST] {prompt} [/INST]"

        formatted_data.append({
            "prompt": prompt,
            "completion": completion
        })

    # Write formatted data to a temporary file
    temp_file = "temp_formatted_data.json"
    with open(temp_file, 'w', encoding='utf-8') as f:
        json.dump(formatted_data, f, indent=2)

    # Load the formatted data as a dataset
    dataset = load_dataset("json", data_files=temp_file)

    # Split the dataset
    splits = dataset["train"].train_test_split(test_size=validation_split, seed=42)
    train_dataset = splits["train"]
    eval_dataset = splits["test"]

    # Remove the temporary file
    os.remove(temp_file)

    return train_dataset, eval_dataset

# Import json (needed for prepare_dataset)
import json

# Get token from environment or use the one you've already entered
hf_token = os.environ.get("HF_TOKEN", "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")

# Run fine-tuning with the model you've specified
finetune_model(
    base_model="meta-llama/Llama-3.2-3B-Instruct",
    data_path="llm_finetuning_data.json",
    output_dir="./military_advisor_model",
    hf_token=hf_token
)

Logging in to Hugging Face with token: hf_B...WDjO
Successfully logged in to Hugging Face!
Using model: meta-llama/Llama-3.2-3B-Instruct
Using precision: torch.float16 (GPU supports bf16: False)
Loading model... (this may take a few minutes)
==((====))==  Unsloth 2025.3.8: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Model loaded successfully!
Setting up LoRA adapters...
LoRA adapters set up successfully!
Preparing dataset...


Generating train split: 0 examples [00:00, ? examples/s]

Dataset prepared: 270 training examples, 30 validation examples
Starting training...
Error during model loading or training: type object 'FastLanguageModel' has no attribute 'get_trainer'
Please check your Hugging Face token and model name, and try again.


(None, None)

In [15]:
import os
import torch
import json
from huggingface_hub import login
from transformers import TrainingArguments, Trainer
from datasets import load_dataset
from peft import LoraConfig

# First check if unsloth is installed, and install if needed
try:
    from unsloth import FastLanguageModel
except ImportError:
    print("Installing unsloth...")
    import subprocess
    import sys
    subprocess.check_call([sys.executable, "-m", "pip", "install", "unsloth"])
    from unsloth import FastLanguageModel

def prepare_dataset(data_path, validation_split=0.1):
    """
    Prepare the dataset for fine-tuning
    """
    # Load the JSON data
    with open(data_path, 'r', encoding='utf-8') as f:
        data = json.load(f)

    # Convert to format expected by datasets library
    formatted_data = []
    for item in data:
        # Extract messages
        messages = item["messages"]

        # Get system message if it exists
        system_msg = ""
        for msg in messages:
            if msg["role"] == "system":
                system_msg = msg["content"]
                break

        # Build prompt and completion
        prompt = ""
        completion = ""

        for msg in messages:
            if msg["role"] == "user":
                prompt = msg["content"]
            elif msg["role"] == "assistant":
                completion = msg["content"]

        # Add system message to prompt if it exists
        if system_msg:
            prompt = f"<s>[INST] {system_msg}\n\n{prompt} [/INST]"
        else:
            prompt = f"<s>[INST] {prompt} [/INST]"

        formatted_data.append({
            "prompt": prompt,
            "completion": completion
        })

    # Write formatted data to a temporary file
    temp_file = "temp_formatted_data.json"
    with open(temp_file, 'w', encoding='utf-8') as f:
        json.dump(formatted_data, f, indent=2)

    # Load the formatted data as a dataset
    dataset = load_dataset("json", data_files=temp_file)

    # Split the dataset
    splits = dataset["train"].train_test_split(test_size=validation_split, seed=42)
    train_dataset = splits["train"]
    eval_dataset = splits["test"]

    # Remove the temporary file
    os.remove(temp_file)

    return train_dataset, eval_dataset

def format_instruction(example):
    """Format each example into the instruction template"""
    prompt = example["prompt"]
    completion = example["completion"]

    # Return a single text string for the model to train on
    return {
        "text": f"{prompt} {completion}</s>"
    }

def finetune_model(
    base_model="meta-llama/Llama-3.2-3B-Instruct",
    data_path="llm_finetuning_data.json",
    output_dir="./military_advisor_model",
    hf_token=None
):
    """
    Fine-tune a model with Unsloth using military scenarios
    """
    # Login to Hugging Face if token is provided
    if hf_token:
        print(f"Logging in to Hugging Face with token: {hf_token[:4]}...{hf_token[-4:]}")
        try:
            login(token=hf_token)
            print("Successfully logged in to Hugging Face!")
        except Exception as e:
            print(f"Error logging in to Hugging Face: {e}")
            print("Continuing without login...")

    # Use correct model name format
    print(f"Using model: {base_model}")

    # Check if GPU supports bf16
    has_bf16_support = torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8
    dtype = torch.bfloat16 if has_bf16_support else torch.float16
    print(f"Using precision: {dtype} (GPU supports bf16: {has_bf16_support})")

    try:
        # Load model - using the new Unsloth API
        print("Loading model... (this may take a few minutes)")
        model, tokenizer = FastLanguageModel.from_pretrained(
            model_name=base_model,
            max_seq_length=4096,
            dtype=dtype,
            load_in_4bit=True,
        )
        print("Model loaded successfully!")

        # Configure LoRA
        lora_config = LoraConfig(
            r=16,
            lora_alpha=16,
            target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                           "gate_proj", "up_proj", "down_proj"],
            lora_dropout=0.05,
            bias="none",
            task_type="CAUSAL_LM"
        )

        # Apply LoRA to model - using the appropriate Unsloth API
        print("Setting up LoRA adapters...")
        model = FastLanguageModel.get_peft_model(
            model,
            lora_config
        )
        print("LoRA adapters set up successfully!")

        # Prepare dataset
        print("Preparing dataset...")
        train_dataset, eval_dataset = prepare_dataset(data_path)
        print(f"Dataset prepared: {len(train_dataset)} training examples, {len(eval_dataset)} validation examples")

        # Format datasets
        train_dataset = train_dataset.map(format_instruction)
        eval_dataset = eval_dataset.map(format_instruction)

        # Set up training arguments
        training_args = TrainingArguments(
            output_dir=output_dir,
            num_train_epochs=3,
            per_device_train_batch_size=4,
            per_device_eval_batch_size=4,
            gradient_accumulation_steps=2,
            eval_strategy="steps",
            eval_steps=0.2,
            save_strategy="steps",
            save_steps=0.2,
            save_total_limit=3,
            load_best_model_at_end=True,
            logging_steps=10,
            learning_rate=2e-4,
            warmup_ratio=0.03,
            weight_decay=0.01,
            fp16=not has_bf16_support,  # Use fp16 only if bf16 not supported
            bf16=has_bf16_support,      # Use bf16 if supported
            max_grad_norm=0.3,
            lr_scheduler_type="cosine",
            seed=42,
        )

        # Create trainer with standard Transformers API
        print("Creating trainer...")
        trainer = Trainer(
            model=model,
            tokenizer=tokenizer,
            args=training_args,
            train_dataset=train_dataset,
            eval_dataset=eval_dataset,
            data_collator=FastLanguageModel.get_unsloth_data_collator(tokenizer),
        )

        # Start training
        print("Starting training...")
        trainer.train()

        # Save the model
        print("Saving model...")
        model.save_pretrained(f"{output_dir}/final")
        tokenizer.save_pretrained(f"{output_dir}/final")
        print(f"Model saved to {output_dir}/final")

        # Create a merged model if possible
        try:
            print("Creating merged model...")
            model = FastLanguageModel.merge_lora_weights(model)
            model.save_pretrained(f"{output_dir}/merged")
            print(f"Merged model saved to {output_dir}/merged")
        except Exception as e:
            print(f"Could not create merged model: {e}")
            print("You can use the LoRA adapter model from the 'final' directory instead.")

        return model, tokenizer

    except Exception as e:
        print(f"Error during model loading or training: {e}")
        print("Please check your Hugging Face token and model name, and try again.")
        import traceback
        traceback.print_exc()
        return None, None

# Get token from environment or use the one you've already entered
hf_token = os.environ.get("HF_TOKEN", "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")

# Run fine-tuning with the model you've specified
finetune_model(
    base_model="meta-llama/Llama-3.2-3B-Instruct",
    data_path="llm_finetuning_data.json",
    output_dir="./military_advisor_model",
    hf_token=hf_token
)

Logging in to Hugging Face with token: hf_B...WDjO
Successfully logged in to Hugging Face!
Using model: meta-llama/Llama-3.2-3B-Instruct
Using precision: torch.float16 (GPU supports bf16: False)
Loading model... (this may take a few minutes)
==((====))==  Unsloth 2025.3.8: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Model loaded successfully!
Setting up LoRA adapters...
Error during model loading or training: Unsloth: Rank of LoraConfig(task_type='CAUSAL_LM', peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path=None, revision=None, inference_mode=False, r=16, target_modules={'up_proj', 'v_proj'

Traceback (most recent call last):
  File "<ipython-input-15-90594c9887c9>", line 140, in finetune_model
    model = FastLanguageModel.get_peft_model(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/unsloth/models/llama.py", line 2013, in get_peft_model
    raise TypeError(f"Unsloth: Rank of {str(r)} must be an integer.")
TypeError: Unsloth: Rank of LoraConfig(task_type='CAUSAL_LM', peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path=None, revision=None, inference_mode=False, r=16, target_modules={'up_proj', 'v_proj', 'o_proj', 'k_proj', 'gate_proj', 'q_proj', 'down_proj'}, exclude_modules=None, lora_alpha=16, lora_dropout=0.05, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={}, eva_config=None, use_dora=False, layer_r

(None, None)