# Basic inference with the quantized version of Alpaca7B model from Huggingface Hub

### Install required libraries

A specific version of the Transformers library is needed to access LLaMA related imports

In [None]:
!pip install -q datasets loralib sentencepiece
!pip install -q git+https://github.com/zphang/transformers@c3dc391
!pip -q install git+https://github.com/huggingface/peft.git
!pip -q install bitsandbytes

[0m

### Load the model and the tokenizer

The quantized ALpaca7B model takes up approximately 8GB in the GPU RAM

In [None]:
import torch
from transformers import LLaMATokenizer, LLaMAForCausalLM, GenerationConfig

In [None]:
tokenizer = LLaMATokenizer.from_pretrained("chainyo/alpaca-lora-7b")

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LlamaTokenizer'. 
The class this function is called from is 'LLaMATokenizer'.


In [None]:
model = LLaMAForCausalLM.from_pretrained(
    "chainyo/alpaca-lora-7b",
    load_in_8bit=True,
    torch_dtype=torch.float16,
    device_map="auto",
)


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116_nocublaslt.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 6.0
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116_nocublaslt.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.


Loading checkpoint shards:   0%|          | 0/39 [00:00<?, ?it/s]

The bug report and warnings above can mostly be ignored

In [None]:
model.eval()
if torch.__version__ >= "2":
    model = torch.compile(model)

### Define the standart Alpaca prompt

In [None]:
def generate_prompt(instruction: str, input_ctxt: str = None) -> str:
    if input_ctxt:
        return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input_ctxt}

### Response:"""
    else:
        return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:"""

### Inference

In [None]:
def generate_response(instruction: str, input_ctxt, generation_config) -> str:
    prompt = generate_prompt(instruction, input_ctxt)
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    input_ids = input_ids.to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            input_ids=input_ids,
            generation_config=generation_config,
            return_dict_in_generate=True,
            output_scores=True,
        )

    response = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True)
    return response

In [None]:
generation_config = GenerationConfig(
    temperature=0,
    top_p=0.75,
    top_k=40,
    num_beams=4,
    max_new_tokens=256,
)

instruction = "Write a poem about Berlin"
input_ctxt = None

response = generate_response(instruction, input_ctxt, generation_config)
print(response)

 Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Write a poem about Berlin

### Response:
Verse 1

The streets of Berlin are filled with life,
A city of culture and of strife.
From the Brandenburg Gate to the Reichstag,
Berlin is a place of great delight.

Verse 2

The Tiergarten is a place of beauty,
A place of peace and tranquility.
From the Victory Column to the Holocaust Memorial,
Berlin is a city of many wonders.

Verse 3

The Berlin Wall is a reminder of the past,
A reminder of a divided city.
From Checkpoint Charlie to the East Side Gallery,
Berlin is a city of many stories.
