## Задание 1. (10 баллов)

Дообучите языковую модель на датасете инструкций, используя LoRA. Проверьте, что дообученная модель отличается от изначальной - сгенерируйте продолжения для одних и тех же промптов и сравните результаты.

Вы можете взять за основу код семинара PEFT, изменив датасет цитат на датасет инструкций (можно просто скопировать из семинара про General_instruct_fine-tuning).
Можно использовать alpaca_dataset, датасет Dolly 2 или переведенный датасет (или все вместе).
Важно использовать модель с большим количеством параметров (относительно семинара по General instruct fine-tuning).
Размер модели должен быть как минимум 3 млрд параметров.  
**Нужно использовать модель, которую мы не разбирали на семинаре (OPT-2.7b, OPT-6.7b). Найдите новую модель на huggingface hub.**



Задача: разраотать модель, которая будет рерайтить тектсы экономических докладов на более понятный для широкой аудитории язык.

In [1]:
!pip install -q bitsandbytes datasets accelerate loralib
!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [2]:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch
import torch.nn as nn
import bitsandbytes as bnb
import transformers
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig

from datasets import load_dataset
from typing import Optional, Dict, Sequence

import pandas as pd
import numpy as np
import json

import copy
import logging
from dataclasses import dataclass, field
from torch.utils.data import Dataset
from transformers import Trainer
import accelerate

In [3]:
# Берём 4 бита, всё, что выше, приводит к перегрузу GPU на этапе обучения
quantization_config = BitsAndBytesConfig(
        load_in_4bit=True
    )

Возьмём вот эту модель: https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual

In [4]:
model = AutoModelForCausalLM.from_pretrained(
    "lightblue/suzume-llama-3-8B-multilingual",
    quantization_config=quantization_config,
    cache_dir='./models'
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [5]:
tokenizer = AutoTokenizer.from_pretrained("lightblue/suzume-llama-3-8B-multilingual", truncation=True, padding=True)
#tokenizer(sentence_chunks_list, truncation=True, padding='max_length', max_length=512)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [6]:
# текст, который мы будем рерайтить, взят отсюда: https://www.imf.org/en/Publications/WP/Issues/2024/05/17/New-Perspectives-on-Quantitative-Easing-and-Central-Bank-Capital-Policies-549168
text_to_rewrite = """
It is useful to assess the effects of QE in a liquidity trap generated by economic conditions of varying severity.
In one case, QE is implemented in a very deep liquidity trap associated with a severe recession in which the output gap is deeply negative,
and inflation well below target. Roughly speaking, this scenario would be similar to that prevailing in the aftermath of the GFC
"""

In [7]:
batch = tokenizer(f"Rewrite the following text making it more understandable for the general audience. Explain every macroeconimical term you find. Text: {text_to_rewrite}", return_tensors='pt').to('cuda')
output_tokens = model.generate(**batch, max_new_tokens=200, temperature=0.1, do_sample=True, no_repeat_ngram_size=2)
print(tokenizer.decode(output_tokens[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Rewrite the following text making it more understandable for the general audience. Explain every macroeconimical term you find. Text: 
It is useful to assess the effects of QE in a liquidity trap generated by economic conditions of varying severity.
In one case, QE is implemented in a very deep liquidity trap associated with a severe recession in which the output gap is deeply negative,
and inflation well below target. Roughly speaking, this scenario would be similar to that prevailing in the aftermath of the GFC
(2007-2008 financial crisis). In this context, the central bank's primary goal is to stabilize the economy and prevent a further decline in output.
The central banks' actions in this situation are aimed at increasing the money supply and lowering interest rates to stimulate economic activity.
However, in another scenario, where the liquidity is not as severely constrained, but still below normal levels, central bankers may adopt a more cautious approach.
They may choose to use

Не сказать, что этот текст можно выдать широкой аудитории. Также не похоже, что модель обратила внимание на просьбу пояснять термины. Я, например, до сих пор не знаю, что такое "ловушка ликвидности" ("liquidity trap")
:/

In [8]:
for param in model.parameters():
  param.requires_grad = False
  if param.ndim == 1:
    # в layernorm нужны очень маленькие числа, поэтому для него оставляют fp32
    param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()
model.enable_input_require_grads()

In [10]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [11]:
for name, module in model.named_modules():
    print(name)


base_model
base_model.model
base_model.model.model
base_model.model.model.embed_tokens
base_model.model.model.layers
base_model.model.model.layers.0
base_model.model.model.layers.0.self_attn
base_model.model.model.layers.0.self_attn.q_proj
base_model.model.model.layers.0.self_attn.q_proj.base_layer
base_model.model.model.layers.0.self_attn.q_proj.lora_dropout
base_model.model.model.layers.0.self_attn.q_proj.lora_dropout.default
base_model.model.model.layers.0.self_attn.q_proj.lora_A
base_model.model.model.layers.0.self_attn.q_proj.lora_A.default
base_model.model.model.layers.0.self_attn.q_proj.lora_B
base_model.model.model.layers.0.self_attn.q_proj.lora_B.default
base_model.model.model.layers.0.self_attn.q_proj.lora_embedding_A
base_model.model.model.layers.0.self_attn.q_proj.lora_embedding_B
base_model.model.model.layers.0.self_attn.k_proj
base_model.model.model.layers.0.self_attn.k_proj.base_layer
base_model.model.model.layers.0.self_attn.k_proj.lora_dropout
base_model.model.model.l

In [12]:
from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=32, # внутренняя размерность адаптера, основной параметр
    target_modules=["q_proj", "k_proj", "v_proj", 'o_proj'], # к каким слоям добавлять адаптеры (подробнее выше)
    lora_alpha=32,

    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

trainable params: 27262976 || all params: 4567863296 || trainable%: 0.5968430803057028


**Теперь попробуем дообучить модель на датасете инструкций.**


In [13]:
dolly = load_dataset("databricks/databricks-dolly-15k", split='train')
dolly = dolly.map(lambda samples: tokenizer(samples['instruction']), batched=True)

In [14]:
trainer = transformers.Trainer(
    model=model,
    train_dataset=dolly,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=100,
        max_steps=400,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir='outputs'
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False, )
)

max_steps is given, it will override any value given in num_train_epochs


In [15]:
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()
model.save_pretrained('suzume_lora')

Step,Training Loss
1,5.2011
2,3.7368
3,3.9521
4,4.5698
5,4.4538
6,4.3897
7,5.1367
8,3.9246
9,4.2282
10,4.2188


In [1]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

peft_model_id = "suzume_lora"

quantization_config = BitsAndBytesConfig(
        load_in_8bit=True
    )
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path="lightblue/suzume-llama-3-8B-multilingual",
                                             return_dict=True,
                                             quantization_config=quantization_config,
                                             device_map='auto'
                                            )
tokenizer = AutoTokenizer.from_pretrained("lightblue/suzume-llama-3-8B-multilingual")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/766 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/164 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [3]:
# продублируем текст
text_to_rewrite = """
It is useful to assess the effects of QE in a liquidity trap generated by economic conditions of varying severity.
In one case, QE is implemented in a very deep liquidity trap associated with a severe recession in which the output gap is deeply negative,
and inflation well below target. Roughly speaking, this scenario would be similar to that prevailing in the aftermath of the GFC
"""

**Ещё раз опробуем базовую модель.**

In [4]:
batch = tokenizer(f"Rewrite the following text making it more understandable for the general audience. Explain every macroeconimical term you find. Text: {text_to_rewrite}", return_tensors='pt').to('cuda')
output_tokens = model.generate(**batch, max_new_tokens=200, temperature=0.1, do_sample=True, no_repeat_ngram_size=2)
print(tokenizer.decode(output_tokens[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Rewrite the following text making it more understandable for the general audience. Explain every macroeconimical term you find. Text: 
It is useful to assess the effects of QE in a liquidity trap generated by economic conditions of varying severity.
In one case, QE is implemented in a very deep liquidity trap associated with a severe recession in which the output gap is deeply negative,
and inflation well below target. Roughly speaking, this scenario would be similar to that prevailing in the aftermath of the GFC
(2008-2009). In another case,
QE is applied in an economy experiencing a mild liquidity crisis, where the central bank has already taken steps to stabilize the financial system.
The key difference between these two scenarios is the severity of economic downturn and the effectiveness of monetary policy in addressing it.
Liquidity trap: A situation in where interest rates are so low that people and businesses are unwilling to borrow money, even at very low interest rate. This is

Здесь модель, кажется, слишком сильно ударилась в объяснение терминов. Неплохо, конечно, но в газету вряд ли пустим.

**Теперь PEFT**

In [5]:
model = PeftModel.from_pretrained(model, peft_model_id)

In [6]:
batch = tokenizer(f"Rewrite the following text making it more understandable for the general audience. Explain every macroeconimical term you find. Text: {text_to_rewrite}", return_tensors='pt').to('cuda')
output_tokens = model.generate(**batch, max_new_tokens=200, temperature=0.1, do_sample=True, no_repeat_ngram_size=2)
print(tokenizer.decode(output_tokens[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Rewrite the following text making it more understandable for the general audience. Explain every macroeconimical term you find. Text: 
It is useful to assess the effects of QE in a liquidity trap generated by economic conditions of varying severity.
In one case, QE is implemented in a very deep liquidity trap associated with a severe recession in which the output gap is deeply negative,
and inflation well below target. Roughly speaking, this scenario would be similar to that prevailing in the aftermath of the GFC
(2008-2009). In another case,
QE is conducted in an economy experiencing a mild liquidity crisis, where the central bank has some room to maneuver and the
output gap remains positive, albeit smaller than in normal times. This scenario could be compared to the economic situation in
the early 2000s, before the housing bubble burst.

In both cases, the effectiveness of quantitative easing can be influenced by the severity of economic downturns and by
monetary policy transmission 

Вот теперь модель дала что-то, что больше походит на переписанный текст, а также отдельна пояснила "ловушку ликвидности". Уже больше похоже на что-то, что можно дать в газету :)