# Installing required packages

The transformers library version 4.34.0 does work quite well with the use case we have in mind.

In [1]:
!pip install transformers==4.34.0



Installing all necessary libraries, such as accelerate, xformers and bitsandbytes. It is also important to install the version of torch detailed below, since some of the later versions can cause dependency issues.

In [2]:
!pip -q install bitsandbytes accelerate xformers einops torch==2.1.0

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m670.2/670.2 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m213.0/213.0 MB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.8/211.8 MB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25h

# Importing libraries

In [3]:
import torch
import transformers

In [4]:
from transformers import AutoTokenizer, AutoModelForCausalLM

# Model Initialization

In [5]:
model_id = "mistralai/Mistral-7B-Instruct-v0.1"

Next up, we provide a configuration for the bitsandbytes package. The bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplication (LLM.int8()), and quantization functions. We can set this configuration using the transformers library. For now, we are using the 4-bit quantisation, and specifying a few other required parameters.

In [6]:
bnb_config = transformers.BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)

Next up, we initialize the model. The transformers library offers significant abstraction to allow you to seamlessly work with a large variety of large language. models. In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you are supplying to the from_pretrained method.

AutoClasses are here to do this job for you so that you automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary:

Instantiating one of AutoModel, AutoConfig and AutoTokenizer will directly create a class of the relevant architecture (ex: model = AutoModel.from_pretrained('bert-base-cased') will create a instance of BertModel).

AutoModel is a generic model class that will be instantiated as one of the base model classes of the huggingface library when created with the AutoModel.from_pretrained(pretrained_model_name_or_path) or the AutoModel.from_config(config) class methods. When used with the from_pretrained method, it instantiates one of the base model classes of the library from a pre-trained model configuration.

The from_pretrained() method takes care of returning the correct model class instance based on the model_type property of the config object, or when it's missing, falling back to using pattern matching on the pretrained_model_name_or_path string.

In [7]:
model = transformers.AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
quantization_config=bnb_config,
device_map='auto',
)

Downloading config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Similar to the AutoModel class, the AutoTokenizer is a generic tokenizer class that will be instantiated as one of the tokenizer classes of the library when created with the AutoTokenizer.from_pretrained(pretrained_model_name_or_path) class method.

The from_pretrained() method takes care of returning the correct tokenizer class instance based on the model_type property of the config object, or when it's missing, falling back to using pattern matching on the pretrained_model_name_or_path string.

In [8]:
tokenizer = transformers.AutoTokenizer.from_pretrained(
model_id,
)

Downloading tokenizer_config.json:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Next up, we need to give the prompt or query that we would be using to query the model. For Mistral 7B models, these have to be in enclosed inside specific format tags: [INST][/INST]

In [10]:
text = "[INST] Do you have mayonnaise recipes? [/INST]"

To use any large language model, we need to tokenize the text and convert into a format that the model would understand. We can do this by passing the text and setting a few other parameters to configure the output formar.

In [11]:
encodeds = tokenizer(text, return_tensors="pt", add_special_tokens=False)

In [12]:
model_inputs = encodeds

In [13]:
generated_ids = model.generate(**model_inputs, max_new_tokens=200, do_sample=True)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Once we have gotten the outputs from the model, we need to convert it back to text. This can be done using the batch_decode method from the tokenizer class, and passing it the outputs from the model.

In [14]:
decoded = tokenizer.batch_decode(generated_ids)

In [15]:
print(decoded[0])

[INST] Do you have mayonnaise recipes? [/INST] Yes, (3 of them): 

1. Classic Mayonnaise:

Ingredients:
- 2 large egg yolks
- 1 tablespoon Dijon mustard
- 1 tablespoon white wine vinegar
- 2 tablespoons olive oil
- Salt and black pepper to taste

Instructions:
[1. Place the egg yolks in a medium-sized mixing bowl.]
[2. In a small saucepan, combine the mustard, vinegar, and 2 tablespoons of water. Heat over low heat, stirring until the ingredients are well combined.]
[3. Slowly pour the hot mustard mixture into the egg yolks, whisking constantly until the sauce thickens.]
[4. Once the sauce has thickened, gradually whisk in the olive oil.]
[5. Season the mayonnaise with salt and black pepper to taste.]
[
