<a href="https://colab.research.google.com/github/tangfy97/llm-tolkien/blob/main/training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install transformers
!pip install peft
!pip install accelerate
!pip install bitsandbytes
!pip install datasets
!pip install huggingface_hub

Collecting transformers
  Downloading transformers-4.35.0-py3-none-any.whl (7.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m30.1 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.16.4 (from transformers)
  Downloading huggingface_hub-0.18.0-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m34.7 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.15,>=0.14 (from transformers)
  Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m75.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m71.3 MB/s[0m eta [36m0:00:00[0m
Col

In [None]:
import sys
sys.path.append('/content/llm')
import config
from training_utils import prepare_model, print_trainable_parameters, compute_perplexity
sys.argv=['']
del sys

In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) y
Token is valid (permission: write).
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in your terminal in case you want to set the '

In [None]:
import os
import logging
import argparse
from typing import Mapping, Any

from torch import cuda
from datasets import load_dataset
from peft import LoraConfig, PeftModel, PeftConfig, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments, DataCollatorForLanguageModeling

import config
from training_utils import prepare_model, print_trainable_parameters, compute_perplexity


LOGGER = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


class LLMTolkien():

    def __init__(self, model_name: str) -> None:
        self.model_name = model_name
        self.device = 'cuda' if cuda.is_available() else 'cpu'

    def train(
            self,
            hf_repo: str,
            lora_config: Mapping[str, Any],
            trainer_config: Mapping[str, Any],
            mlm: bool,
        ) -> None:
        tokenizer = AutoTokenizer.from_pretrained(self.model_name)
        model = AutoModelForCausalLM.from_pretrained(self.model_name, device_map="auto", load_in_8bit=True)
        model = prepare_model(model)
        model = get_peft_model(model, LoraConfig(**lora_config))
        LOGGER.info(f"Model trainable parameters:\n {print_trainable_parameters(model)}")
        dataset = load_dataset(hf_repo)
        LOGGER.info(f"Train dataset downloaded:\n {dataset['train']}")
        LOGGER.info(f"Number of tokens for the training: {dataset['train'].num_rows*len(dataset['train']['input_ids'][0])}")
        trainer = Trainer(
            model=model,
            train_dataset=dataset['train'],
            args=TrainingArguments(**trainer_config),
            data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=mlm)
        )
        model.config.use_cache = False  # silence warnings
        trainer.train()
        model.config.use_cache = True
        model.push_to_hub(repo_id=hf_repo)
        tokenizer.push_to_hub(repo_id=hf_repo)

    def evaluate():
        pass

    def generate(self, prompt: str, hf_repo: str, max_new_tokens: int, temperature: float, do_sample: bool) -> None:
        # Import the model
        config = PeftConfig.from_pretrained(hf_repo)
        model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map='auto')
        tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
        # Load the Lora model
        self.model = PeftModel.from_pretrained(model, hf_repo)

        # Generate text
        inputs = tokenizer(prompt, return_tensors="pt")
        tokens = self.model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            do_sample=do_sample,
        )
        print(tokenizer.decode(tokens[0]))


if __name__ == "__main__":

    parser = argparse.ArgumentParser(description="Train an LLM with Lora.")
    parser.add_argument("--model-name", type=str, default=config.model_name, help="The name of the model to train.")
    parser.add_argument("--hf-repo", type=str, default=config.hf_repo, help="The name of the HuggingFace repo to push the model to.")
    parser.add_argument("--mlm", type=bool, default=config.mlm, help="Whether to use MLM or not for the training.")
    parser.add_argument('--lora-r', type=float, default=config.lora_r, help="The Lora parameter r, the number of heads.")
    parser.add_argument('--lora-alpha', type=float, default=config.lora_alpha, help="Lora parameter.")
    parser.add_argument('--lora-dropout', type=float, default=config.lora_dropout, help="Lora dropout.")
    parser.add_argument('--lora-bias', type=str, default=config.lora_bias, help="Lora bias.")
    parser.add_argument('--lora-task-type', type=str, default=config.lora_task_type, help="Lora task type.")
    parser.add_argument('--per-device-train-batch-size', type=int, default=config.per_device_train_batch_size, help="The batch size per device for the training.")
    parser.add_argument('--gradient-accumulation-steps', type=int, default=config.gradient_accumulation_steps, help="The number of gradient accumulation steps.")
    parser.add_argument('--warmup-steps', type=int, default=config.warmup_steps, help="The number of warmup steps.")
    parser.add_argument('--weight-decay', type=float, default=config.weight_decay, help="The weight decay.")
    parser.add_argument('--num-train-epochs', type=float, default=config.num_train_epochs, help="The number of training epochs.")
    parser.add_argument('--learning-rate', type=float, default=config.learning_rate, help="The learning rate.")
    parser.add_argument('--fp16', type=bool, default=config.fp16, help="Whether to use fp16 or not.")
    parser.add_argument('--logging-steps', type=int, default=config.logging_steps, help="The number of logging steps.")
    parser.add_argument('--output-dir', type=str, default=config.hf_repo, help="The output directory.")
    parser.add_argument('--overwrite-output_dir', type=bool, default=config.overwrite_output_dir, help="Whether to overwrite the output directory.")
    parser.add_argument('--save-strategy', type=str, default=config.save_strategy, help="The saving strategy.")
    parser.add_argument('--evaluation-strategy', type=str, default=config.evaluation_strategy, help="The evaluation strategy.")
    parser.add_argument('--push-to-hub', type=bool, default=config.push_to_hub, help="Whether to push the model to the HuggingFace Hub.")
    args = parser.parse_args()


    lora_config = {
        "r": args.lora_r,
        "lora_alpha": args.lora_alpha,
        "lora_dropout": args.lora_dropout,
        'bias': args.lora_bias,
        "task_type": args.lora_task_type,
    }

    trainer_config = {
        "per_device_train_batch_size": args.per_device_train_batch_size,
        "gradient_accumulation_steps": args.gradient_accumulation_steps,
        "warmup_steps": args.warmup_steps,
        "weight_decay": args.weight_decay,
        "num_train_epochs": args.num_train_epochs,
        "learning_rate": args.learning_rate,
        "fp16": args.fp16,
        "logging_steps": args.logging_steps,
        "output_dir": args.output_dir,
        "overwrite_output_dir": args.overwrite_output_dir,
        "evaluation_strategy": args.evaluation_strategy,
        "save_strategy": args.save_strategy,
        "push_to_hub": args.push_to_hub
    }

    model = LLMTolkien(args.model_name)
    model.train(
        hf_repo=args.hf_repo,
        lora_config=lora_config,
        trainer_config=trainer_config,
        mlm=args.mlm
    )


Downloading (…)okenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/693 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/6.01G [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/402 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/1.20M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/154k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/289 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/33 [00:00<?, ? examples/s]

You're using a BloomTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
1,3.3961
2,2.7543
3,2.9244
4,2.2245
5,2.7627
6,2.9123
7,2.8016
8,2.939
9,3.0789
10,2.6558


adapter_model.safetensors:   0%|          | 0.00/19.7M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

In [2]:
!pip install accelerate
!pip install bitsandbytes



In [3]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftConfig, PeftModel

# Import the model
config = PeftConfig.from_pretrained("ftang97/sw-consultancy-book")
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-3b", load_in_8bit=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-3b")
# Load the Lora model
model = PeftModel.from_pretrained(model, "ftang97/sw-consultancy-book")

# The hobbits were so suprised seeing their friend again that they did not
# speak. Aragorn looked at them, and then he turned to the others.</s>

Downloading (…)/adapter_config.json:   0%|          | 0.00/437 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/693 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/6.01G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Downloading (…)er_model.safetensors:   0%|          | 0.00/19.7M [00:00<?, ?B/s]



The hobbits were so suprised seeing their friend, Frodo, that they did not even greet him.</s>


In [6]:
prompt = "The system has lots of duplications and component entanglements, which could cause problems, such as"

inputs = tokenizer(prompt, return_tensors="pt")
generated_sequences = model.generate(
    **inputs,
    max_new_tokens=200,  # Increase the number of tokens to generate
    temperature=0.8,     # Adjust temperature if necessary
    min_length=150,      # Ensure a minimum length of generation
    no_repeat_ngram_size=2,  # Prevent repeating n-grams
    eos_token_id=tokenizer.eos_token_id,  # End-of-sentence token
    early_stopping=False,  # Continue generating even if eos_token_id is generated
    num_return_sequences=1  # Number of different sequences to generate
)

# Decode the generated sequences to get the text
generated_text = tokenizer.decode(generated_sequences[0], skip_special_tokens=True)
print(generated_text)



The system has lots of duplications and component entanglements, which could cause problems, such as component dependencies not being resolved correctly. The system is not modular, and the components are not tightly coupled. This is a typical situation in which the system architecture is hard to improve. It is also a situation where the architecture of the codebase is difficult to analyze. In this case, the analysis of component interactions is the first step in the improvement process. See Chapter 7 for more on this topic.CHAPTER 8 Guiding Improvement Processes The goal of this chapter is to help you understand the process of improving software. We will discuss the four steps of improvement: analysis, planning, implementation,and evaluation. Each step is discussed in turn.
