#Welcome to the 'Reasoning' with LLMs Workshop!

Author: Marko Velic

June 2025

##Introduction
In this workshop we will explore reasoning with LLMs. During the workshop we will gradually build our own library for learning, exploring and playing with Reasoning (or Thinking) LLMs. We will work through notebooks, but at the same time, we will export important pieces of code into the separate .py files and thus build a small reasoning library that is easy to use later and build upon.

To avoid any surprises on premises, please run all cells from this notebook in advace. This will download a Gemma3 open-weights model and save it on your Google Drive. A model will take up approximately one GB of space. We will use this model as a starting point on which we will build our 'reasoning' model. It will then load that same model from Drive and try it out.

For this notebook to work, you need to run in the Runtime that has NVIDIA GPU. Don't worry - it is just a few click away and free (thank you Google Colab :-)). You just need to click on the Connection Options in the upper-right corner of the screen (or just click on the `Runtime` item in the main Menu) and then click on the `Change runtime type`. Select `T4 GPU` or `L4 GPU` option.

In [None]:
# --- Initial Project Setup ---
# This cell creates the directory structure for our project.
# We'll be populating these files as we go through the workshop.

import os
import sys
from google.colab import drive
drive.mount('/content/drive')

# Add the path to your project folder in Drive to Python's search path
directory = '/content/drive/MyDrive/Colab_Notebooks/llm_workshop'
sys.path.append(directory)
file_path = f"{directory}/utils_hello_drive.py"
if not os.path.exists(directory):
    os.makedirs(directory)


In [None]:
#Subdirs
!mkdir {directory}/notebooks
!mkdir {directory}/scripts
!mkdir {directory}/src
!mkdir {directory}/outputs
!mkdir {directory}/outputs/sft_model
!mkdir {directory}/outputs/grpo_model
!mkdir {directory}/models


In [None]:
# Create the __init__.py file to make 'src' a Python package
!touch {directory}/src/__init__.py


In [None]:
%%writefile {directory}/src/utils.py

Now, let's try to put some .py file in out ```src``` folder and then try to use it in our notebook with ```import```

In [None]:
%%writefile  {directory}/hello_from_drive.py

def hello_world():
  print("Hello from our python scrip in Drive.py!")

In [None]:
import hello_from_drive

hello_from_drive.hello_world()

In [None]:
!pip install uv

In [None]:
!uv pip install unsloth vllm


In [None]:
from unsloth import FastLanguageModel
import torch

In [None]:
%%writefile {directory}/utils.py
from unsloth import FastLanguageModel

def load_model(model_name, **model_kwargs) -> FastLanguageModel:

  default_kwargs = {
    "max_seq_length": 1024,  # Can increase for longer reasoning traces
    "load_in_4bit": True,  # False for LoRA 16bit
    "fast_inference": True,  # Enable vLLM fast inference
    "max_lora_rank": 32, # Larger rank = smarter, but slower
    "gpu_memory_utilization": 0.6,  # Reduce if out of memory
  }

  default_kwargs.update(model_kwargs)
  lora_rank = default_kwargs['max_lora_rank']

  model, tokenizer = FastLanguageModel.from_pretrained(
      model_name= model_name,
      **default_kwargs,
  )

  model = FastLanguageModel.get_peft_model(
      model,
      r=lora_rank,  # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
      target_modules=[
          "q_proj",
          "k_proj",
          "v_proj",
          "o_proj",
          "gate_proj",
          "up_proj",
          "down_proj",
      ],  # Remove QKVO if out of memory
      lora_alpha=lora_rank,
      use_gradient_checkpointing="unsloth",  # Enable long context finetuning
      random_state=3407,
  )
  return model, tokenizer


In [None]:
import utils
model, tokenizer = utils.load_model("unsloth/gemma-3-1b-it")


In [None]:
model.save_pretrained(f"{directory}/models/gemma-3-1b-it")
tokenizer.save_pretrained(f"{directory}/models/gemma-3-1b-it")


Now, let's try to load a model from Drive and generate some text.

In [None]:
 model_from_drive, tokenizer_from_drive = utils.load_model (f"{directory}/models/gemma-3-1b-it")


In [None]:
prompt = "Once upon a time in a land far, far away, there lived a"
inputs = tokenizer_from_drive(prompt, return_tensors="pt").to(model_from_drive.device)


In [None]:
outputs = model_from_drive.generate(
    **inputs,
    max_length=50,
    do_sample=True,
    top_k=50,
    top_p=0.95,
    temperature=0.7,
    output_scores=True,
    return_dict_in_generate=True,
    num_return_sequences=1
)


In [None]:
# Decode the generated text
generated_text = tokenizer_from_drive.decode(outputs[0][0], skip_special_tokens=True)
print(generated_text)