<h2><b>Notebook MultiGPU WeniGPT</b></h2>

---

# 1) Library installation

In [1]:
#@title libs installation

# !pip install transformers==4.34.0
# !pip install torch==2.0.1
# !pip install peft==0.4.0
# !pip install safetensors>=0.4.1
# !pip install evaluate==0.4.1
# !pip install bitsandbytes==0.41.2
# !pip install huggingface_hub==0.17.3
# !pip install deepspeed==0.12.6
# !pip install -U datasets
# !pip install seqeval
# !pip install optimum
# !pip install auto-gptq
# !pip install autoawq
# !pip install wandb
# !pip install git+https://github.com/huggingface/trl.git@main
# !pip install git+https://github.com/huggingface/accelerate.git@main

'''
!pip install -U datasets==2.13.0
!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
!sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
!wget https://developer.download.nvidia.com/compute/cuda/12.3.1/local_installers/cuda-repo-ubuntu2204-12-3-local_12.3.1-545.23.08-1_amd64.deb
!dpkg -i cuda-repo-ubuntu2204-12-3-local_12.3.1-545.23.08-1_amd64.deb
!cp /var/cuda-repo-ubuntu2204-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/
!apt-get update
!apt-get -y install cuda-toolkit-12-3
'''

'\n!pip install -U datasets==2.13.0\n!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin\n!sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600\n!wget https://developer.download.nvidia.com/compute/cuda/12.3.1/local_installers/cuda-repo-ubuntu2204-12-3-local_12.3.1-545.23.08-1_amd64.deb\n!dpkg -i cuda-repo-ubuntu2204-12-3-local_12.3.1-545.23.08-1_amd64.deb\n!cp /var/cuda-repo-ubuntu2204-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/\n!apt-get update\n!apt-get -y install cuda-toolkit-12-3\n'

# 2) Importing Dependencies

In [2]:
#@title imports

from transformers import TrainingArguments, AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, GPTQConfig, DataCollatorWithPadding
from peft import LoraConfig, get_peft_model, PeftConfig, AutoPeftModelForCausalLM, prepare_model_for_kbit_training
from datasets import load_dataset, Dataset, DatasetDict
from huggingface_hub.utils import enable_progress_bars
from accelerate import Accelerator
from huggingface_hub import HfApi
from trl import SFTTrainer
from typing import Dict
from typing import Any
import huggingface_hub
import pandas as pd
import transformers
import accelerate
import deepspeed
import evaluate
import datetime
import locale
import wandb
import torch
import time
import os
import gc
from datetime import date

locale.getpreferredencoding = lambda: "UTF-8"
torch.utils.checkpoint.use_reentrant=True

[2024-01-22 15:24:46,674] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)


In [3]:
#@title utils | Clear cache + execution time

class ClearCache():
    """
    Classe para gerenciar o esvaziamento da memória cache da GPU utilizando o PyTorch.

    Essa classe permite limpar a memória cache da GPU antes e depois da execução
    de um bloco de código usando o gerenciador de contexto 'with'.

    Exemplo de uso:
    ```
    with ClearCache():
        # Seu código que utiliza recursos da GPU aqui
    # Memória cache da GPU é automaticamente liberada ao sair do bloco 'with'
    ```

    """

    def __enter__(self):
        """
        Método de entrada do gerenciador de contexto.

        Esse método é chamado quando o bloco 'with' é iniciado. Ele esvazia a memória
        cache da GPU utilizando a função 'torch.cuda.empty_cache()'.

        """
        torch.cuda.empty_cache()

    def __exit__(self, exc_type, exc_val, exc_tb):
        """
        Método de saída do gerenciador de contexto.

        Esse método é chamado quando o bloco 'with' é encerrado. Ele também esvazia
        a memória cache da GPU utilizando a função 'torch.cuda.empty_cache()'.

        :param exc_type: Tipo da exceção, se ocorrer
        :param exc_val: Valor da exceção, se ocorrer
        :param exc_tb: Traceback da exceção, se ocorrer

        """
        torch.cuda.empty_cache()


class EasyDict(dict):
    """Convenience class that behaves like a dict but allows access with the attribute syntax."""

    def __getattr__(self, name: str) -> Any:
        try:
            return self[name]
        except KeyError:
            raise AttributeError(name)

    def __setattr__(self, name: str, value: Any) -> None:
        self[name] = value

    def __delattr__(self, name: str) -> None:
        del self[name]


def execution_time(func):
    """
    Decorator that measures the execution time of a given function and prints the result.

    This decorator can be used to wrap around a function to measure the time it takes
    to execute. It will print the execution time in seconds.

    Args:
        func (callable): The function to measure the execution time of.
    Returns:
        callable: A wrapper function that measures the execution time and calls the
        original function.

    Example usage:
    @execution_time
    def my_function():
        # Your code here

    """
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()
        execution_time = end_time - start_time
        print(f"Execution time for {func.__name__}: {round(execution_time, 3)} seconds")
        return result

    return wrapper

def clear_memory_cache():
    """
     Clears the GPU memory cache and collects garbage.

    This function performs the following operations:
    1. Resets the maximum memory allocated on the GPU using `torch.cuda.reset_max_memory_allocated()`.
    2. Resets the peak memory statistics using `torch.cuda.reset_peak_memory_stats()`.
    3. Empties the GPU memory cache using `torch.cuda.empty_cache()`.
    4. Collects and prints the number of unreachable objects using `gc.collect()`.

    This function can be useful to free up GPU memory and improve memory management when working with PyTorch.

    Example usage:
    ```
    clear_memory_cache()
    ```
    """
    torch.cuda.reset_max_memory_allocated()
    torch.cuda.reset_peak_memory_stats()
    torch.cuda.empty_cache()
    print(f"Cleared memory: {gc.collect()}")



In [4]:
#@title cuda infos
print(f"Cuda is available: ", torch.cuda.is_available())
print(f"Cuda device capability: ", torch.cuda.get_device_capability())
#print(f"Cuda visible devices: ", os.environ["CUDA_VISIBLE_DEVICES"])

#device_index = 0
#device = torch.device(f'cuda:{device_index}' if torch.cuda.is_available() else 'cpu')
!export 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512'
!export TOKENIZERS_PARALLELISM=true

Cuda is available:  True
Cuda device capability:  (8, 0)


In [5]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

In [6]:
#@title nvidia-smi
!nvidia-smi

Mon Jan 22 15:24:47 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100 80G...  On   | 00000000:44:00.0 Off |                    0 |
| N/A   32C    P0    43W / 300W |      3MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100 80G...  On   | 00000000:84:00.0 Off |                    0 |
| N/A   30C    P0    43W / 300W |      3MiB / 81920MiB |      0%      Default |
|       

In [7]:
#@title Training parameters

training_arguments = {
    'model_base_repository_id': "HuggingFaceH4/zephyr-7b-beta",
    'hub_model_id': "Weni/WeniGPT-2.2.1-Zephyr-7B-LLM_Base_2.0.3_SFT",
    'dataset_id': "Weni/LLM_Base_2.0.3_SFT",
    'folder_name': "zephyr7bbeta",
    'description': 'sft-multilanguage',

    # Dataset
    'dataset_text_field': "prompt",
    'context_field': "",
    'instruction_field': "instruction",
    'target_field': "chosen_response",
    'train_dataset':"train",
    'eval_dataset':"test",
    
    # HuggingFace
    'hub_token': os.environ.get('HUB_TOKEN'),
    'push_to_hub': True,
    'hub_strategy': 'all_checkpoints',

    # Wandb
    'report_to': 'wandb',
    'wandb_token':os.environ.get('WANDB_TOKEN'),
    
    # Lora
    'bits': 4 ,
    'use_exllama': True,
    'device_map': "auto",
    'use_cache': False,
    'lora_r': 32,
    'lora_alpha': 32,
    'lora_dropout': 0.05,
    'bias': "none",
    'target_modules': ["q_proj", "v_proj"],
    'task_type': "CAUSAL_LM",

    # Bits and bytes
    'load_in_4bit':True,
    'use_4bit':True,
    'bnb_4bit_use_double_quant':True,
    'bnb_4bit_quant_type':"nf4",
    'bnb_4bit_compute_dtype': torch.float16,

    # Training Args
    'max_seq_length':  8192,
    'num_train_epochs': 3,
    'num_gpus': 4, 
    'per_device_train_batch_size':  2,
    'per_device_eval_batch_size': 2,
    'gradient_accumulation_steps': 8,
    'gradient_checkpointing': True,
    'optimizer': "AdamW",
    'learning_rate':  2e-4,
    'logging_steps': 50,
    'max_steps':10,
    'fp16': True,
    'packing': True,
    'lr_scheduler_type': "constant_with_warmup",
    'pretraining_tp': 1,
    'mlm':False,
    'save_strategy': "steps",
    'evaluation_strategy': "steps",
    'load_best_model_at_end': True,
    'metric_for_best_model': 'eval_loss',
    'greater_is_better': False,
    'prediction_loss_only':True,
    'save_safetensors': True,
    'max_grad_norm': 0.3,
    'warmup_ratio': 0.03,
    'weight_decay': 0.01,
    'neftune_noise_alpha':5,
    'torch_dtype': torch.float16,
    'save_total_limit': 5,

    # Tokenizer
    'padding': True,
    'padding_side': 'left',
    'add_eos_token':True,
    'trust_remote_code': True,
    'use_auth_token':True,

    # Misc
    'disable_tqdm': False,
    'include_inputs_for_metrics':True,
    
    # GPU config 
    
    'deepspeed': {
        'gradient_accumulation_steps': 1,
        'gradient_clipping': 1.0,
        'offload_optimizer_device': 'cpu',
        'offload_param_device': 'cpu',
        'zero3_init_flag': True,
        'zero3_save_16bit_model': False,
        'zero_stage': 3
    },
}

training_arguments = EasyDict(training_arguments)

# 3) Load Dataset

In [8]:
#@title load and process dataset
def format_instruction(sample):
    sample['prompt'] = f"""<s>Você é um médico tratando um paciente com amnésia. Para responder as perguntas do paciente, você irá ler um texto anteriormente para se contextualizar. Se você trouxer informações desconhecidas, fora do texto lido, poderá deixar o paciente confuso. Se o paciente fizer uma questão sobre informações não presentes no texto, você precisa responder 'Desculpe, não possuo essa informação', pois se tentar responder, pode trazer informações que não ajudarão o paciente recuperar sua memória.\n\nTEXTO: {sample['context']}.\n\nPERGUNTA: {sample['question']}.\nLembre, se não estiver no texto ou não souber a resposta, responda especificamente 'Desculpe, não possuo essa informação'. Precisamos ajudar o paciente.\n\nRESPOSTA: {sample['answer']}</s>"""
    return sample

@execution_time
def load_dataset_and_split(dataset_id, columns, sample=None, seed=55, test_size=0.1):
    """
    Loads a dataset with the given ID, shuffles it, and splits the training set into
    training and testing sets.

    Parameters:
    - dataset_id (str): The ID of the dataset to load.
    - sample (int, optional): The number of rows to use from the dataset (default is None).
    - seed (int, optional): Seed for random shuffling of the dataset (default is 55).
    - test_size (float, optional): The proportion of the dataset to include in the test split (default is 0.1).

    Returns:
    - tuple: A tuple containing the training and testing datasets.
    """
    dataset = load_dataset(dataset_id)

    if sample is not None:
        sample_size =  int(len(dataset) * sample)
        dataset = dataset.shuffle(seed=seed)['train'].select(range(sample_size))
    else:
        dataset = dataset.shuffle(seed=seed)['train']

    dataset = dataset.train_test_split(test_size=test_size)

    dataset['train'] = dataset['train'].map(format_instruction, num_proc=8, remove_columns=columns)
    dataset['test'] = dataset['test'].map(format_instruction, num_proc=8, remove_columns=columns)

    return dataset

In [9]:
columns = ['question', 'answer', 'context', 'correct_ans']
dataset = load_dataset_and_split(training_arguments.dataset_id, columns, test_size=0.1)
dataset

Map (num_proc=8):   0%|          | 0/35410 [00:00<?, ? examples/s]

Map (num_proc=8):   0%|          | 0/3935 [00:00<?, ? examples/s]

Execution time for load_dataset_and_split: 3.554 seconds


DatasetDict({
    train: Dataset({
        features: ['prompt'],
        num_rows: 35410
    })
    test: Dataset({
        features: ['prompt'],
        num_rows: 3935
    })
})

In [10]:
num_gpus = training_arguments.num_gpu
epochs = training_arguments.num_train_epochs
training_arguments.max_steps = epochs * int(len(dataset['train']) / (num_gpus * training_arguments.per_device_train_batch_size * training_arguments.gradient_accumulation_steps))
training_arguments.max_steps

1659

In [None]:
training_arguments['train_dataset_len'] = len(dataset['train'])
training_arguments['test_dataset_len'] = len(dataset['test'])

In [11]:
# dataset_for_gptq = load_dataset_and_split(training_arguments.dataset_id, 'resposta', test_size=0.1, sample=100)
# dataset_for_gptq

In [12]:
#@title checking the dataset output
dataset['train']['prompt'][0]

"<s>Você é um médico tratando um paciente com amnésia. Para responder as perguntas do paciente, você irá ler um texto anteriormente para se contextualizar. Se você trouxer informações desconhecidas, fora do texto lido, poderá deixar o paciente confuso. Se o paciente fizer uma questão sobre informações não presentes no texto, você precisa responder 'Desculpe, não possuo essa informação', pois se tentar responder, pode trazer informações que não ajudarão o paciente recuperar sua memória.\n\nTEXTO: La geopolítica y las relaciones internacionales están intrínsecamente conectadas, ya que representan el análisis y la comprensión de la dinámica de poder entre países y regiones del mundo. La geopolítica abarca el estudio de la influencia de factores como la geografía, los recursos naturales, la demografía y la tecnología en las relaciones internacionales, mientras que las relaciones internacionales cubren las interacciones políticas, económicas, culturales y militares entre países. Es esencial

# 4) Functions dedicated to preprocessing, training, and model storage

In [13]:
for i in range(10):
  clear_memory_cache()



Cleared memory: 390
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0


In [14]:
#@title Functions

@execution_time
def load_model_and_tokenizer(model_base_repository_id, quantized=True, quantization_type=None, dataset=None):
    """
    Carrega o modelo e o tokenizer.

    Parameters:
    - model_base_repository_id (str): O ID do repositório.
    - quantized (bool): Se o modelo deve ser quantizado.
    - quantization_type (str): Tipo de quantização ("bits_and_bytes" ou "gpqt").

    Returns:
    - Tuple[AutoModelForCausalLM, AutoTokenizer]: Modelo e Tokenizer.
    """
    tokenizer = AutoTokenizer.from_pretrained(
        model_base_repository_id,
        padding=training_arguments.padding,
        max_lenght=training_arguments.max_seq_length,
        trust_remote_code=training_arguments.trust_remote_code,
    )

    tokenizer.add_eos_token = training_arguments.add_eos_token
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = training_arguments.padding_side

    if quantized:
        if quantization_type == "bits_and_bytes":
            model = AutoModelForCausalLM.from_pretrained(
                model_base_repository_id,
                quantization_config=BitsAndBytesConfig(
                    load_in_4bit=training_arguments.load_in_4bit,
                    use_4bit=training_arguments.use_4bit,
                    bnb_4bit_use_double_quant=training_arguments.bnb_4bit_use_double_quant,
                    bnb_4bit_quant_type=training_arguments.bnb_4bit_quant_type,
                    bnb_4bit_compute_dtype=training_arguments.bnb_4bit_compute_dtype
                ),
                use_cache=training_arguments.use_cache,
                device_map=training_arguments.device_map,
                torch_dtype=training_arguments.torch_dtype
            )
        elif quantization_type == "gptq":
            model = AutoModelForCausalLM.from_pretrained(
                model_base_repository_id,
                quantization_config=GPTQConfig(
                    bits=training_arguments.bits,
                    dataset=dataset_for_gptq['train']['prompt'],
                    use_exllama=training_arguments.use_exllama,
                    tokenizer=tokenizer
                ),
                use_cache=training_arguments.use_cache,
                device_map=training_arguments.device_map,
                torch_dtype=training_arguments.torch_dtype
            )
        else:
            raise ValueError("Tipo de quantização não suportado")
    else:
        model = AutoModelForCausalLM.from_pretrained(
            model_base_repository_id,
            use_cache=training_arguments.use_cache,
            device_map=training_arguments.device_map,
            torch_dtype=training_arguments.torch_dtype
        )


    return model, tokenizer

@execution_time
def configure_and_prepare_model(model):
    """
    """
    model.config.use_cache = training_arguments.use_cache
    model.config.pretraining_tp = training_arguments.pretraining_tp
    model.gradient_checkpointing_enable()
    #model.enable_input_require_grads()
    model = prepare_model_for_kbit_training(model)

    peft_config = LoraConfig(
        r=training_arguments.lora_r,
        lora_alpha=training_arguments.lora_alpha,
        lora_dropout=training_arguments.lora_dropout,
        bias=training_arguments.bias,
        task_type=training_arguments.task_type,
        target_modules=training_arguments.target_modules
    )

    model = get_peft_model(model, peft_config)

    return model, peft_config

def configure_and_prepare_model_for_BitsBytes(model, peft_config):
    """
    """
    model.config.use_cache = training_arguments.use_cache
    model.config.pretraining_tp = training_arguments.pretraining_tp
    model.gradient_checkpointing_enable()
    #model.enable_input_require_grads()
    model = get_peft_model(model, peft_config)

    return model

@execution_time
def build_model_name(hub_model_id,  dataset, num_train_epochs, per_device_train_batch_size):
    """
    Builds a name for the model and defines the paths to save the model and the tokenizer.
    Args:
        model_hf (str): Name or identifier of the Hugging Face model.
        dataset_file (str): Path to the CSV file containing the dataset data.
        epochs (int): Number of training epochs for the model.
        batch_size (int): Batch size used during training.
        folder_name (str): Name of the directory where the model and tokenizer will be saved.
        description (str): Model description or training objective.

    Returns:
    tuple: Model name and the full path to save it.
    """
    dataset = dataset
    dataset_size = "{:,.0f}".format(len(dataset)).replace(",", ".")
    today = datetime.date.today()
    today = today.strftime("%d-%m-%y")
    model_name = str(today) + '-' + hub_model_id.replace('/','-') + '_' + description + '-' + str(dataset_size) + '_epochs-' + str(num_train_epochs) + '_batch_' + str(per_device_train_batch_size)

    date_now = date.today()
    dir_model_name = './' + folder_name + '/' + model_name + "_"+ str(date_now)
    drive_model_name = '/content/drive/Shareddrives/ModelosdeIA/Modelos/Zephyr/'+ dir_model_name[11:]

    return dir_model_name, drive_model_name

@execution_time
def login_hugging_face_hub(token):
    """
    Log in to the Hugging Face platform using the provided token and set the model repository name.

    This function logs in to the Hugging Face platform using the provided authentication token and
    sets the name of the model repository that will be used later.

    Parameters:
        token (str): The authentication token to log in to the Hugging Face platform.
        model_name (str): The name of the model repository created on Hugging Face.

    Example:
        token = "your_token_here"
        model_name = "your_repository_name"
        login_hugging_face_hub(token, model_name)
    """
    !huggingface-cli login --token $token

@execution_time
def push_to_hub(model, tokenizer, huggingface_model_name):
    """
    Push the model and its associated tokenizer to the Hugging Face Model Hub.

    This function sends a trained model and its corresponding tokenizer to the Hugging Face Model Hub,
    allowing them to be shared, versioned, and used by other users.

    Args:
        model (PreTrainedModel): The trained model to be pushed to the Model Hub.
        tokenizer (PreTrainedTokenizer): The tokenizer corresponding to the model.
        huggingface_model_name (str): Name of the model repository on the Hugging Face Model Hub.

    Returns:
        None

    Note:
        Make sure you have imported the PreTrainedModel and PreTrainedTokenizer classes.

    Example Usage:
    >>> model = TrainedModel()
    >>> tokenizer = ModelTokenizer()
    >>> push_to_hub(model, tokenizer, 'my-awesome-model')
    """
    try:
        model.push_to_hub(huggingface_model_name, use_auth_token=training_arguments.use_auth_token)
        tokenizer.push_to_hub(huggingface_model_name, use_auth_token=training_arguments.use_auth_token)
    except Exception as e:
        print("An error occurred:", e)

@execution_time
def create_huggingface_repository(repository_id, first_commit_message):
    """
    Create a private repository on Hugging Face.

    This function creates a private repository on Hugging Face with the specified repository ID
    and an initial commit message.

    Args:
        repository_id (str): The unique ID for the new repository.
        first_commit_message (str): The message for the initial commit.

    Returns:
        None

    Example Usage:
    >>> repo_id = "my-repo"
    >>> initial_message = "Initial commit"
    >>> create_huggingface_repository(repo_id, initial_message)
    """
    repository_id = repository_id
    first_commit_message = first_commit_message
    api.create_repo(repo_id=repository_id)

@execution_time
def choose_optimizer(optim_string, learning_rate, params):
    if optim_string == "AdamW":
        return transformers.AdamW(params, lr=learning_rate)
    elif  optim_string == "Adafactor":
        return transformers.Adafactor(params, relative_step=False, warmup_init=False, lr=learning_rate)
    else:
        raise Exception("Unknown optimizer")

@execution_time
def choose_scheduler(scheduler_string, optimizer, num_steps, warmup_ratio):
    num_warmup = int(warmup_ratio * num_steps)
    if scheduler_string == "cosine":
        return transformers.get_cosine_schedule_with_warmup(optimizer=optimizer, num_warmup_steps=num_warmup, num_training_steps=num_steps) 
    elif scheduler_string == "constant_with_warmup":
        return transformers.get_constant_schedule_with_warmup(optimizer=optimizer, num_warmup_steps=num_warmup)
    else:
        raise Exception("Unknown scheduler")

@execution_time
def train_model(model, tokenizer, dataset, peft_config,  dir_model_name, optimizer, scheduler, training_arguments):
    """
    """
    trainer = SFTTrainer(
        model=model,
        max_seq_length=training_arguments.max_seq_length,
        neftune_noise_alpha=training_arguments.neftune_noise_alpha,
        train_dataset=dataset['train'],
        eval_dataset=dataset['test'],
        dataset_text_field=training_arguments.dataset_text_field,
        peft_config=peft_config,
        packing=training_arguments.packing,
        optimizers=(optimizer, scheduler),
        args=TrainingArguments(
            disable_tqdm=training_arguments['disable_tqdm'],
            num_train_epochs=training_arguments['num_train_epochs'],
            per_device_train_batch_size=training_arguments['per_device_train_batch_size'],
            per_device_eval_batch_size=training_arguments['per_device_eval_batch_size'],
            gradient_accumulation_steps=training_arguments['gradient_accumulation_steps'],
            gradient_checkpointing=training_arguments['gradient_checkpointing'],
            learning_rate=training_arguments['learning_rate'],
            max_steps=training_arguments['max_steps'],
            fp16=training_arguments['fp16'],
            save_strategy=training_arguments['save_strategy'],
            evaluation_strategy=training_arguments['evaluation_strategy'],
            load_best_model_at_end=training_arguments['load_best_model_at_end'],
            metric_for_best_model=training_arguments['metric_for_best_model'],
            greater_is_better=training_arguments['greater_is_better'],
            prediction_loss_only=training_arguments['prediction_loss_only'],
            save_safetensors=training_arguments['save_safetensors'],
            save_total_limit=training_arguments['save_total_limit'],
            report_to=training_arguments['report_to'],
            max_grad_norm=training_arguments['max_grad_norm'],
            warmup_ratio=training_arguments['warmup_ratio'],
            weight_decay=training_arguments['weight_decay'],
            hub_model_id=training_arguments['hub_model_id'],
            push_to_hub=training_arguments['push_to_hub'],
            hub_strategy=training_arguments['hub_strategy'],
            hub_token=training_arguments['hub_token'],
            output_dir=f"{dir_model_name}/checkpoints/",
            save_steps=50,
            logging_steps=50,
            include_inputs_for_metrics=True,
          ),
          data_collator=transformers.DataCollatorForLanguageModeling(
              tokenizer,
              mlm=training_arguments.mlm),

    )
    trainer.train()
    eval_results = trainer.evaluate()
    trainer.save_model(dir_model_name)
    tokenizer.save_pretrained(dir_model_name)
    clear_memory_cache()

    return eval_results


@execution_time
def main(hub_model_id, model, tokenizer, data, token,
         dir_model_name, peft_config, optimizer, scheduler, training_arguments):

    for i in range(3):
      clear_memory_cache()

    login_hugging_face_hub(token)
    results = train_model(model, tokenizer, data, peft_config,  dir_model_name,  optimizer, scheduler, training_arguments)
    push_to_hub(model, tokenizer, training_arguments.hub_model_id)
    model1 = AutoPeftModelForCausalLM.from_pretrained(
        dir_model_name,
        low_cpu_mem_usage=True,
        torch_dtype=torch.float16,
    )
    
    merged_model = model1.merge_and_unload()
    merged_model.save_pretrained(dir_model_name)
    tokenizer.save_pretrained(dir_model_name)
    push_to_hub(merged_model, tokenizer, training_arguments.hub_model_id)

    return results

In [15]:
#@title get dir name to save model

folder_name = training_arguments.folder_name
model_base_repository_id = training_arguments.model_base_repository_id
description = training_arguments.description
epochs = training_arguments.num_train_epochs
per_device_train_batch_size = training_arguments.per_device_train_batch_size
dataset = dataset

dir_model_name, drive_model_name = build_model_name(model_base_repository_id, dataset, epochs, per_device_train_batch_size)
print(f"model_complete_name: {drive_model_name, dir_model_name}")

Execution time for build_model_name: 0.0 seconds
model_complete_name: ('/content/drive/Shareddrives/ModelosdeIA/Modelos/Zephyr/eta/22-01-24-HuggingFaceH4-zephyr-7b-beta_sft-multilanguage-2_epochs-5_batch_2_2024-01-22', './zephyr7bbeta/22-01-24-HuggingFaceH4-zephyr-7b-beta_sft-multilanguage-2_epochs-5_batch_2_2024-01-22')


In [16]:
#@title login huggingface hub

login_hugging_face_hub(training_arguments.hub_token)

for i in range(5):
  clear_memory_cache()

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful
Execution time for login_hugging_face_hub: 0.737 seconds
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0


# 5) Model training execution

Exemplo de uso com quantização GPTQ<br>
```
model, tokenizer = load_model_and_tokenizer(training_arguments.model_base_repository_id, quantized=True, quantization_type='gptq', dataset=dataset['train']['prompt'])
```
<br>

Exemplo de uso com quantização Bits and Bytes<br>
```
model, tokenizer = load_model_and_tokenizer(training_arguments.model_base_repository_id, quantized=True, quantization_type='bits_and_bytes')
```

Exemplo de uso sem quantização<br>
```
model, tokenizer = load_model_and_tokenizer(training_arguments.model_base_repository_id, quantized=False)
```

In [17]:
wandb.login(key = os.environ['WANDB_TOKEN'])
accelerator = Accelerator(log_with="wandb")

[34m[1mwandb[0m: Currently logged in as: [33mbeatriz-maia[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


In [None]:
#@title Load and prepare model + load tokenizer

for i in range(6):
    clear_memory_cache()
    
with ClearCache():
    model, tokenizer = load_model_and_tokenizer(training_arguments.model_base_repository_id, quantized=True, quantization_type='bits_and_bytes')
    model, peft_config = configure_and_prepare_model(model)

    # inicializar otimizador com base no hiperparams
    optimizer = choose_optimizer(training_arguments.optimizer, training_arguments.learning_rate, model.parameters())
    # inicializar scheduler com base no hiperparams
    scheduler = choose_scheduler(training_arguments.lr_scheduler_type, optimizer, training_arguments.max_steps, training_arguments.warmup_ratio)      
    
    model, dataset, scheduler = Accelerator().prepare(model, dataset, scheduler)
    
    for i in range(6):
        clear_memory_cache()

    # wand init accelerate com base nos novos hiperparams
    kwargs = {
            "notes": 'Experimentos SFT',
            "group": f"Sprint 16 - {str(date.today())}",
            "name": f"SFT-{training_arguments.folder_name}",
            "entity":"weni",
            "job_type": "train"
        }
        
    accelerator.init_trackers(
            project_name = 'WeniGPT',
            config = training_arguments,
            init_kwargs={"wandb": kwargs}
        )
        
    results = main(training_arguments.hub_model_id, model, tokenizer, dataset, training_arguments.hub_token,
        dir_model_name, peft_config, optimizer, scheduler, training_arguments)
    
    accelerator.end_training()



Cleared memory: 1495
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

Execution time for load_model_and_tokenizer: 12.878 seconds


Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


Execution time for configure_and_prepare_model: 0.264 seconds
Execution time for choose_optimizer: 0.004 seconds
Execution time for choose_scheduler: 0.0 seconds
Cleared memory: 41
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0
Cleared memory: 0


[34m[1mwandb[0m: Currently logged in as: [33mbeatriz-maia[0m ([33mweni[0m). Use [1m`wandb login --relogin`[0m to force relogin




Cleared memory: 209
Cleared memory: 0
Cleared memory: 0
Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful
Execution time for login_hugging_face_hub: 0.939 seconds




Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


Step,Training Loss,Validation Loss
50,0.5823,0.461868
100,0.4406,0.425017
150,0.4172,0.410482
200,0.4063,0.402087
250,0.3934,0.396053
300,0.3868,0.390326
350,0.3762,0.387024
400,0.3715,0.382124
450,0.3587,0.379463
500,0.3539,0.374863




In [None]:
1+1

In [19]:
accelerator.end_training()

VBox(children=(Label(value='7.765 MB of 7.765 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
eval/loss,█▅▄▄▃▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁
eval/runtime,▁▁▄▂▁▁▂▂▃▂▇▂▁▂▂█▂▂▁▂▂
eval/samples_per_second,██▄▇██▇█▇▇▂▇█▇▇▁▇▇█▇▇
eval/steps_per_second,██▆█████▆█▃████▁█████
train/epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███
train/global_step,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███
train/learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/loss,█▅▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁

0,1
eval/loss,0.3678
eval/runtime,573.6553
eval/samples_per_second,0.317
eval/steps_per_second,0.159
train/epoch,10.23
train/global_step,1050.0
train/learning_rate,0.0002
train/loss,0.2589
