### Introduction
In this tutorial, we'll fine-tune a large language model to improve its ability to generate and explain complex python code. 

We'll use the LeMa framework to streamline the process and achieve high-quality results.

We'll cover the following topics:
1. Prerequisites
2. Data Preparation & Sanity Checks
3. Training Config Preparation
4. Launching Training
5. Monitoring Progress
6. Evaluation
7. Analysing Results
8. Inference


### 1. Prerequisites
#### 1.1. LeMa Installation
First, let's install lema. You can find detailed instructions [here](https://github.com/openlema/lema/blob/main/README.md), but it should be as simple as:

```bash
pip install -e ".[dev,train]"
```

#### 1.2. Creating our working directory
For our experiments, we'll use the following folder to save the model, training artifacts, and our working configs.

In [None]:
from pathlib import Path

tutorial_dir = "finetuning_tutorial"

Path(tutorial_dir).mkdir(parents=True, exist_ok=True)

#### 1.3. Setup the environment

We'll need to set the following environment variables:
- [Optional] HF_TOKEN: Your [HuggingFace](https://huggingface.co/docs/hub/en/security-tokens) token, in case you want to access a private model.
- [Optional] WANDB_API_KEY: Your [wandb](https://wandb.ai) token, in case you want to log your experiments to wandb.

### 2. Getting Started


#### 2.1 Data Preparation
Let's start by checking out our datasets, and seeing what the data looks like. The Alpaca dataset includes a variety of tasks, including code generation and explanation.

In [None]:
from lema.builders import build_tokenizer
from lema.core.types import ModelParams
from lema.datasets.alpaca import AlpacaDataset

# Initialize the dataset
tokenizer = build_tokenizer(ModelParams(model_name="gpt2"))
dataset = AlpacaDataset(tokenizer=tokenizer)

# Print a few examples
for i in range(3):
    conversation = dataset.conversation(i)
    print(f"Example {i + 1}:")
    for message in conversation.messages:
        print(f"{message.role}: {message.content[:100]}...")  # Truncate for brevity
    print("\n")

#### 2.2. Model Preparation

For code generation, we want a model with strong general language understanding and coding capabilities. 

We also want a model that is small enough to train and run on a single GPU.

Some good options include:
- ["microsoft/Phi-3-mini-128k-instruct"](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)
- ["google/gemma-2b"](https://huggingface.co/google/gemma-2b)
- ["Qwen/Qwen2-1.5B-Instruct"](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct)


For this tutorial, we'll use "Qwen/Qwen2-1.5B-Instruct" as our base model.



### 2.3. Initial Model Responses

Let's see how our model performs on this dataset.

In [None]:
%%writefile $tutorial_dir/inference_config.yaml

model:
  model_name: "Qwen/Qwen2-1.5B-Instruct"
  trust_remote_code: true
  torch_dtype_str: "half"
  device_map: "auto"

generation:
  max_new_tokens: 512
  batch_size: 1

In [None]:
from lema.core.types import InferenceConfig
from lema.infer import infer

config = InferenceConfig.from_yaml(Path(tutorial_dir) / "inference_config.yaml")

input_text = (
    "Write a Python function to implement the quicksort algorithm. "
    "Please include comments explaining each step."
)

results = infer(config.model, config.generation, [[input_text]])

print(results[0][0])

### 3. Preparing our training experiment



Let's create a YAML file with our training config:

In [None]:
%%writefile $tutorial_dir/train.yaml

model:
  model_name: "Qwen/Qwen2-1.5B-Instruct"
  trust_remote_code: true
  torch_dtype_str: "half"
  device_map: "auto"

data:
  train:
    datasets:
      - dataset_name: "tatsu-lab/alpaca"
        split: "train"
    target_col: "text"
    

training:
  output_dir: output
  per_device_train_batch_size: 2
  gradient_accumulation_steps: 8
  max_steps: 10
  learning_rate: 1e-5
  lr_scheduler_type: "cosine"
  warmup_steps: 200
  logging_steps: 10
  save_steps: 200
  eval_steps: 200

  use_peft: true
  trainer_type: "TRL_SFT"

peft:
  lora_r: 16
  lora_alpha: 32
  lora_dropout: 0.05
  lora_target_modules:
    - "q_proj"
    - "k_proj"
    - "v_proj"
    - "o_proj"
    - "gate_proj"
    - "up_proj"
    - "down_proj"

### 4. Fine-tuning the model

This will start the fine-tuning process using the LeMa framework. The process will take a few hours, depending on your GPU.

In [None]:
!lema-train -c "$tutorial_dir/train.yaml"

### 5. Monitoring the model

Let's see how our model is doing. Things to watch out for:


### 6. Evaluation


Let's create an evaluation configuration file:


In [None]:
%%writefile $tutorial_dir/eval.yaml

model:
  model_name: "./output"
  trust_remote_code: true
  torch_dtype_str: "half"
  device_map: "auto"

data:
  datasets:
    - dataset_name: "openai_humaneval"

evaluation_framework: "lm_harness"
num_shots: 0
output_dir: "./output"

In [None]:
!lema-evaluate -c "$tutorial_dir/eval.yaml"

### 7. Analyze Results and Iterate

#### [To be continued]

### 8. Use the Fine-tuned Model

Once we're happy with the results, we can serve the fine-tuned model for interactive inference:

In [None]:
%%writefile $tutorial_dir/train_inference_config.yaml

model:
  model_name: "./output"
  trust_remote_code: true
  torch_dtype_str: "half"
  device_map: "auto"

generation:
  max_new_tokens: 512
  batch_size: 1

In [None]:
from lema.core.types import InferenceConfig
from lema.infer import infer

config = InferenceConfig.from_yaml(Path(tutorial_dir) / "train_inference_config.yaml")

input_text = (
    "Write a Python function to implement the quicksort algorithm. "
    "Please include comments explaining each step."
)

results = infer(config.model, config.generation, [[input_text]])

print(results[0][0])