## 1) Installation of dependencies and libraries

In [1]:
!pip install --upgrade pip
!pip install "datasets==2.13.0" "trl==0.4.7" "Peft==0.5.0" "safetensors>=0.3.1" "torch==2.0.0" sentencepiece fire einops --upgrade
!pip install git+https://github.com/huggingface/transformers
!pip install -i https://test.pypi.org/simple/ bitsandbytes
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install flash-attn --no-build-isolation --upgrade

[0mCollecting git+https://github.com/huggingface/transformers
  Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-cgrqo2rk
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-cgrqo2rk
  Resolved https://github.com/huggingface/transformers to commit 21dc5859421cf0d7d82d374b10f533611745a8c5
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[0mLooking in indexes: https://test.pypi.org/simple/
[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[0m

In [2]:
!nvidia-smi

Fri Oct 13 21:58:15 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    46W / 400W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## 2) Loading and processing data

Function for the format of the provided prompt

In [3]:
def format_instruction(sample):
	return f"""### Instruction:
Responda à pergunta com a maior sinceridade possível usando o CONTEXTO fornecido e, se a resposta não estiver contida no CONTEXTO abaixo, diga 'Desculpe, não possuo essa informação'.

### Input:
CONTEXTO: {sample['context']}

PERGUNTA: {sample['question']}

### Response: {sample['resposta']}
"""

Loading dataset directly from HuggingFace Datasets Library. <br>
Available at https://huggingface.co/datasets/Weni/LLM-base

In [4]:
from datasets import load_dataset
from random import randrange


dataset = load_dataset("Weni/LLM-base", split='train')
dataset = dataset.shuffle(seed=55)
dataset



Dataset({
    features: ['id', 'question', 'resposta', 'context', 'correct_ans'],
    num_rows: 29073
})

## 3) Dedicated functions for processing, training, and memory management

In [5]:
import torch
import accelerate
import gc
import time
import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

class ClearCache:
    """
    Class for managing GPU memory cache clearance using PyTorch

    This class allows clearing the GPU memory cache before and after executing a block of code using the 'with' context manager.
    Usage example:

    ```
    with ClearCache():
        Your code that utilizes GPU resources here
    # GPU memory cache is automatically released upon exiting the 'with' block"
    ```
    """

    def __enter__(self):
        """
        Entry method of the context manager.
        This method is called when the 'with' block is initiated.
        It clears the GPU memory cache using the 'torch.cuda.empty_cache()' function.
        """
        torch.cuda.empty_cache()

    def __exit__(self, exc_type, exc_val, exc_tb):
        """
        Exit method of the context manager.
        This method is called when the 'with' block is exited. It also clears the GPU memory cache using the 'torch.cuda.empty_cache()' function.

        exc_type: The type of exception, if it occurs.
        exc_val: The value of the exception, if it occurs.
        exc_tb: The traceback of the exception, if it occurs

        """
        torch.cuda.empty_cache()

def execution_time(func):
    """
    Decorator that measures the execution time of a given function and prints the result.

    This decorator can be used to wrap around a function to measure the time it takes
    to execute. It will print the execution time in seconds.

    Args:
        func (callable): The function to measure the execution time of.
    Returns:
        callable: A wrapper function that measures the execution time and calls the
        original function.

    Example usage:
    @execution_time
    def my_function():
        # Your code here

    """
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()
        execution_time = end_time - start_time
        print(f"Execution time for {func.__name__}: {round(execution_time, 3)} seconds")
        return result

    return wrapper

def clear_memory_cache():
    """
     Clears the GPU memory cache and collects garbage.

    This function performs the following operations:
    1. Resets the maximum memory allocated on the GPU using `torch.cuda.reset_max_memory_allocated()`.
    2. Resets the peak memory statistics using `torch.cuda.reset_peak_memory_stats()`.
    3. Empties the GPU memory cache using `torch.cuda.empty_cache()`.
    4. Collects and prints the number of unreachable objects using `gc.collect()`.

    This function can be useful to free up GPU memory and improve memory management when working with PyTorch.

    Example usage:
    ```
    clear_memory_cache()
    ```
    """
    torch.cuda.reset_max_memory_allocated()
    torch.cuda.reset_peak_memory_stats()
    torch.cuda.empty_cache()
    print(f"Cleared memory: {gc.collect()}")

@execution_time
def build_model_name(model_hf,  dataset, epochs, per_device_train_batch_size):
    """
    Builds a name for the model and defines the paths to save the model and the tokenizer.
    Args:
        model_hf (str): Name or identifier of the Hugging Face model.
        dataset_file (str): Path to the CSV file containing the dataset data.
        epochs (int): Number of training epochs for the model.
        batch_size (int): Batch size used during training.
        folder_name (str): Name of the directory where the model and tokenizer will be saved.
        description (str): Model description or training objective.

    Returns:
    tuple: Model name and the full path to save it.
    """
    dataset = dataset
    dataset_size = "{:,.0f}".format(len(dataset)).replace(",", ".")
    today = datetime.date.today()
    today = today.strftime("%d-%m-%y")
    model_name = str(today) + '-' + model_hf.replace('/','-') + '_' + description + '-' + str(dataset_size) + '_epochs-' + str(epochs) + '_batch_' + str(per_device_train_batch_size)
    model_name = './' + folder_name + '/' + model_name

    return model_name

@execution_time
def load_model_and_tokenizer(model_id, quantized=False, quantization_config=None):
    """
    Load a language model and tokenizer for text generation.

    Args:
        model_id (str): The identifier of the pre-trained language model to load.
        quantized (bool, optional): Whether to load a quantized version of the model.
            Defaults to False, loading the non-quantized model.
        quantization_config (dict, optional): Configuration settings for quantization.
            Only required if quantized is True.

    Returns:
        Tuple: A tuple containing the loaded model and tokenizer.

    The function loads a language model and tokenizer based on the provided `model_id`.
    If `quantized` is set to True, it loads a quantized model using the specified
    `quantization_config`. Common configurations such as `model.config.pretraining_tp`
    are set for both cases. The tokenizer is configured to use the end-of-sequence token
    as the padding token on the right side.

    Example:
        # Load a non-quantized model
        model, tokenizer = load_model_and_tokenizer("gpt2")

        # Load a quantized model with custom quantization config
        quantization_config = {
            "param1": value1,
            "param2": value2
        }
        model, tokenizer = load_model_and_tokenizer("gpt2", quantized=True, quantization_config=quantization_config)
    """
    if quantized:
        model = AutoModelForCausalLM.from_pretrained(model_id,
                                                     quantization_config=bnb_config,
                                                     use_cache=False,
                                                     device_map="auto",
                                                     use_safetensors=False,
                                                     use_flash_attention_2=True)
    else:
        model = AutoModelForCausalLM.from_pretrained(model_id,
                                                     use_cache=False,
                                                     device_map="auto",
                                                     torch_dtype=torch.float16,
                                                     use_safetensors=False,
                                                     use_flash_attention_2=True)

    model.config.pretraining_tp = 1

    tokenizer = AutoTokenizer.from_pretrained(model_id)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"

    return model, tokenizer

@execution_time
def login_hugging_face_hub(token):
    """
    Log in to the Hugging Face platform using the provided token and set the model repository name.

    This function logs in to the Hugging Face platform using the provided authentication token and
    sets the name of the model repository that will be used later.

    Parameters:
        token (str): The authentication token to log in to the Hugging Face platform.
        model_name (str): The name of the model repository created on Hugging Face.

    Example:
        token = "your_token_here"
        model_name = "your_repository_name"
        login_hugging_face_hub(token, model_name)
    """
    !huggingface-cli login --token $token

@execution_time
def push_to_hub(model, tokenizer, huggingface_model_name):
    """
    Push the model and its associated tokenizer to the Hugging Face Model Hub.

    This function sends a trained model and its corresponding tokenizer to the Hugging Face Model Hub,
    allowing them to be shared, versioned, and used by other users.

    Args:
        model (PreTrainedModel): The trained model to be pushed to the Model Hub.
        tokenizer (PreTrainedTokenizer): The tokenizer corresponding to the model.
        huggingface_model_name (str): Name of the model repository on the Hugging Face Model Hub.

    Returns:
        None

    Note:
        Make sure you have imported the PreTrainedModel and PreTrainedTokenizer classes.

    Example Usage:
    >>> model = TrainedModel()
    >>> tokenizer = ModelTokenizer()
    >>> push_to_hub(model, tokenizer, 'my-awesome-model')
    """
    try:
        model.push_to_hub(huggingface_model_name, use_auth_token=True)
        tokenizer.push_to_hub(huggingface_model_name, use_auth_token=True)
    except Exception as e:
        print("An error occurred:", e)

@execution_time
def create_huggingface_repository(repository_id, first_commit_message):
    """
    Create a private repository on Hugging Face.

    This function creates a private repository on Hugging Face with the specified repository ID
    and an initial commit message.

    Args:
        repository_id (str): The unique ID for the new repository.
        first_commit_message (str): The message for the initial commit.

    Returns:
        None

    Example Usage:
    >>> repo_id = "my-repo"
    >>> initial_message = "Initial commit"
    >>> create_huggingface_repository(repo_id, initial_message)
    """
    repository_id = repository_id
    first_commit_message = first_commit_message
    api.create_repo(repo_id=repository_id, private=True)

@execution_time
def model_inference(text, tokenizer, device=device):
    encoding = tokenizer(text, return_tensors="pt").to(device)
    with torch.inference_mode():
        outputs = model.generate(
            input_ids=encoding.input_ids,
            attention_mask=encoding.attention_mask,
            max_new_tokens=500,
            top_k=10,
            typical_p=0.95,
            temperature=0.5,
            top_p=0.95,
            num_return_sequences=1,
            repetition_penalty=1.03,
            do_sample=False,
        )
    classification = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return classification

## 4) Model and tokenizer loading

Model ID reference - Mistral Instruct 7B <br>
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1

Load the quantized version of the model

In [23]:
import torch
import accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from flash_attn import flash_attn_qkvpacked_func, flash_attn_func
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = 'mistralai/Mistral-7B-Instruct-v0.1'

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

with ClearCache():
    model, tokenizer = load_model_and_tokenizer(model_id, quantized=True, quantization_config=bnb_config)
    clear_memory_cache()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Execution time for load_model_and_tokenizer: 16.463 seconds
Cleared memory: 21


Lora configuration

In [24]:
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model

peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.1,
        r=64,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules=["q_proj",
                        "k_proj",
                        "v_proj"
                        "o_proj"],
)

model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)
clear_memory_cache()

Cleared memory: 1472


## 5) Defining Training parameters

Setting training parameters for the SFTTrainer.

In [33]:
import torch
import transformers

repository_id = "Your_huggingFace_profile/Your_repo_name"
token = 'Your_token_here'
max_seq_length = 2048

training_arguments_mistral = {
    'num_train_epochs':10,
    'per_device_train_batch_size':2,
    'gradient_accumulation_steps':2,
    'gradient_checkpointing':True,
    'optim':'adamw_torch',
    'lr_scheduler_type':'constant_with_warmup',
    'logging_steps':10,
    'save_strategy':"epoch",
    'learning_rate':4e-4,
    'save_total_limit':3,
    'fp16':True,
    'max_steps':5,
    'max_grad_norm':0.3,
    'warmup_ratio':0.03,
    'disable_tqdm':False,
    'weight_decay':0.001,
    'hub_model_id':repository_id,
    'push_to_hub':True,
    'hub_strategy':'every_save',
    'hub_token':token,
    'hub_private_repo':True,
}

Creating the model complete name - it will be the directory created to save the produced artifacts. <br>

folder_name `(str)` = The directory name where the model files will be saved <br>
model_hf `(str)` =  Model identifier name.  Example: 'Mistral-7B-Instruct-v0.1' <br>
dataset `(DatasetDict)` = dataset <br>
description `(str)` = Any description or important information about the model. Example: 'first-finetune' <br>

In [9]:
folder_name = 'WeniGPT'
model_hf = 'Mistral-7B-Instruct-v0.1'
dataset = dataset
description = 'first-finetune'

epochs = training_arguments_mistral['num_train_epochs']
per_device_train_batch_size = training_arguments_mistral['per_device_train_batch_size']

model_complete_name = build_model_name(model_hf, dataset, epochs, per_device_train_batch_size)
model_complete_name

Execution time for build_model_name: 0.0 seconds


'./WeniGPT/13-10-23-Mistral-7B-Instruct-v0.1_first-finetune-29.073K_epochs-10_batch_2'

In [34]:
import transformers
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForLanguageModeling

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset.with_format("torch"),
    peft_config=peft_config,
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    packing=True,
    formatting_func=format_instruction,
    args=TrainingArguments(
        output_dir=model_complete_name,
        num_train_epochs=training_arguments_mistral['num_train_epochs'],
        per_device_train_batch_size=training_arguments_mistral['per_device_train_batch_size'],
        gradient_accumulation_steps=training_arguments_mistral['gradient_accumulation_steps'],
        gradient_checkpointing=training_arguments_mistral['gradient_checkpointing'],
        optim=training_arguments_mistral['optim'],
        lr_scheduler_type=training_arguments_mistral['lr_scheduler_type'],
        logging_steps=training_arguments_mistral['logging_steps'],
        save_strategy=training_arguments_mistral['save_strategy'],
        learning_rate=training_arguments_mistral['learning_rate'],
        save_total_limit=training_arguments_mistral['save_total_limit'],
        fp16=training_arguments_mistral['fp16'],
        max_steps=training_arguments_mistral['max_steps'],
        max_grad_norm=training_arguments_mistral['max_grad_norm'],
        warmup_ratio=training_arguments_mistral['warmup_ratio'],
        disable_tqdm=training_arguments_mistral['disable_tqdm'],
        weight_decay=training_arguments_mistral['weight_decay'],
        hub_model_id=training_arguments_mistral['hub_model_id'],
        push_to_hub=training_arguments_mistral['push_to_hub'],
        hub_strategy=training_arguments_mistral['hub_strategy'],
        hub_token=training_arguments_mistral['hub_token'],
        hub_private_repo=training_arguments_mistral['hub_private_repo']

        ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

In [None]:
clear_memory_cache()

Logging in to the HuggingFace Hub

In [12]:
login_hugging_face_hub(token)

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful
Execution time for login_hugging_face_hub: 0.606 seconds


Creating the HuggingFace repository to upload the trained model

In [18]:
from huggingface_hub import HfApi
api = HfApi()

first_commit_message = 'create repository'
repository_id = repository_id
create_huggingface_repository(repository_id, first_commit_message)

Execution time for create_huggingface_repository: 0.395 seconds


## 6) Training, merging and saving the model

In [35]:
with ClearCache():
    trainer.train()
    trainer.save_model(model_complete_name)

Step,Training Loss




Cleared memory: 168


In [None]:
clear_memory_cache()

Merging the model and sending it to the Hugging Face hub

In [37]:
from peft import AutoPeftModelForCausalLM

with ClearCache():
    model1 = AutoPeftModelForCausalLM.from_pretrained(
        model_complete_name,
        low_cpu_mem_usage=True,
        torch_dtype=torch.float16,
    )

    merged_model = model1.merge_and_unload()
    merged_model.save_pretrained(model_complete_name)

    tokenizer.save_pretrained(model_complete_name)

    push_to_hub(merged_model, tokenizer, repository_id)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]



Execution time for push_to_hub: 1046.383 seconds
