In [None]:
from google.colab import drive

drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Using a model from Huggingface (using Transformers)

In [None]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
tokenizer.pad_token_id = 50256
model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")

In [None]:
# let's do inference now

while True:
    prompt = input("Your prompt > ")
    inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True).to(model.device)
    outputs = model.generate(**inputs, max_length=50)  # You can adjust max_length as needed
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True, pad_token_id=50256)
    print(generated_text)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


t.co/1QXqYqYqE — The Daily Caller (@TheDC) September 24, 2017

The Daily Caller reported that the FBI is investigating the Trump campaign's ties to Russia.

"The FBI is


# OpenChat Finetuning with 4bit Quantization

We recommend using a GPU runtime for this example. In the Colab menu bar, choose Runtime > Change Runtime Type and choose GPU under Hardware Accelerator.

## Install Ludwig

We'll use the latest version of Ludwig which includes support for quantized fine-tuning.

In [None]:
!pip uninstall -y tensorflow --quiet
!pip install "ludwig[llm]" --quiet

[0m

## Set up HuggingFace API Token

Obtain a [HuggingFace API Token](https://huggingface.co/docs/hub/security-tokens) and request access to [Llama2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) before proceeding.

In [None]:
import os

os.environ["HUGGING_FACE_HUB_TOKEN"] = ""

## Finetuning our model with Ludwig

The Ludwig [configuration](https://ludwig.ai/latest/configuration/) specifies the components of the training job including:

- Model type (LLM) and base pretrained model name from HuggingFace
- Base model: https://huggingface.co/openchat/openchat_3.5
- Input and output features from the training dataset
- Quantization (4bit) and parameter-efficient fine-tuning (LoRA)
- Training hyperparameters (learning rate, batch size, etc.)
- Preprocessing (e.g., sampling to speed up training)
- Backend for execution (local, but could also be Ray)

In [None]:
import yaml

config_str = """
model_type: llm
base_model: openchat/openchat_3.5

quantization:
  bits: 4

adapter:
  type: lora

prompt:
  template: |
    ### Instruction:
    {instruction}

    ### Input:
    {input}

    ### Response:

input_features:
  - name: prompt
    type: text
    preprocessing:
      max_sequence_length: 256

output_features:
  - name: output
    type: text
    preprocessing:
      max_sequence_length: 256

trainer:
  type: finetune
  learning_rate: 0.0001
  batch_size: 1
  gradient_accumulation_steps: 16
  epochs: 1
  learning_rate_scheduler:
    warmup_fraction: 0.01

preprocessing:
  sample_ratio: 0.1
"""

config = yaml.safe_load(config_str)

## Train!

Start training on your local GPU and monitor progress (including metrics) inline.

In this example, we'll be training on the [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html) dataset to turn Llama2-7b into a rudimentary chatbot. But you can use any dataset to fine-tune for other tasks.

In [None]:
import logging
from ludwig.api import LudwigModel


model = LudwigModel(config=config, logging_level=logging.INFO)
results = model.train(dataset="ludwig://alpaca")
print(results)

## Deploy our model to HF



In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) y
Token is valid (permission: write).
[1m[31mCannot authenticate through 

In [None]:
!ludwig upload hf_hub --repo_id pnotaro/finetuning-test --model_path /content/results/api_experiment_run_5

adapter_model.safetensors: 100% 13.6M/13.6M [00:02<00:00, 5.17MB/s]
Model uploaded to `https://huggingface.co/pnotaro/finetuning-test/tree/main/` with repository name `pnotaro/finetuning-test`


## Create an OpenAI compatible API for your model

In [None]:
!pip install easyllm

In [None]:
from easyllm.clients import huggingface
from easyllm.prompt_utils import build_llama2_prompt

# helper to build llama2 prompt
huggingface.prompt_builder = build_llama2_prompt

response = huggingface.ChatCompletion.create(
    model="pnotaro/finetuning-test",
    messages=[
        {
            "role": "system",
            "content": "\nYou are a helpful assistant speaking like a pirate. argh!",
        },
        {"role": "user", "content": "What is the sun?"},
    ],
    temperature=0.9,
    top_p=0.6,
    max_tokens=256,
)

print(response)