In [1]:
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer
import torch

device = 'cuda:0'



Note: if you're just developing, Mistral-7B is more than sufficient. For quality runs, consider Mixtral 8x7B (on any gpu, takes ~16 min to load) or even 8x22B (requires at least an A100, and takes ~72 minutes to load).

In [2]:
model_name = 'mistralai/Mistral-7B-Instruct-v0.2'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map=device)


# model_name = 'mistralai/Mixtral-8x7B-Instruct-v0.1'
# model_name = 'mistralai/Mixtral-8x22B-Instruct-v0.1'

# bnb_config = BitsAndBytesConfig(
# load_in_4bit=True,
# bnb_4bit_use_double_quant=True,
# bnb_4bit_quant_type="nf4",
# bnb_4bit_compute_dtype=torch.bfloat16
# )

# model = AutoModelForCausalLM.from_pretrained(model_name, low_cpu_mem_usage=True, quantization_config=bnb_config, device_map=device)


Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [3]:
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] What is your favourite condiment? [/INST]Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> [INST] Do you have mayonnaise recipes? [/INST] Certainly! Here's a simple and classic mayonnaise recipe:

Ingredients:
- 1 egg yolk
- 1 tbsp Dijon mustard
- 1 tbsp white wine vinegar
- 1 cup (200 ml) vegetable oil
- Salt, to taste

Instructions:
1. In a large bowl, whisk together the egg yolk, mustard, and vinegar until well combined. The mixture should be thick and pale in color.
2. Drizzle in the vegetable oil, very slowly, in a thin, steady stream, whisking constantly. You may need to adjust the whisking speed slightly depending on the thickness of the oil.
3. Once all of the oil has been incorporated and the mayonnaise has thickened, add salt to taste.
4. Cover the bowl with plastic wrap, pressing it directly on the surface of the mayonnaise to prevent a skin from forming.
5.

In [4]:
# Problem: Consider a function f(x)f(x) defined as f(x)=x2+bx+cf(x)=x2+bx+c where bb and cc are constants. The graph of f(x)f(x) passes through the point (3,10)(3,10). Additionally, it is known that the vertex of the parabola described by f(x)f(x) is located at (5,d)(5,d), where dd is a constant.

#     Determine the values of bb, cc, and dd.
#     Find the range of the function based on the determined vertex and its direction.

messages = [
    {"role": "user", "content": "Consider a function f(x) defined as f(x)=x^2+bx+c where b and c are constants. The graph of f(x) passes through the point (3,10). Additionally, it is known that the vertex of the parabola described by f(x) is located at (5,d), where d is a constant. Determine the values of b, c, and d. Find the range of the function based on the determined vertex and its direction."}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] Consider a function f(x) defined as f(x)=x^2+bx+c where b and c are constants. The graph of f(x) passes through the point (3,10). Additionally, it is known that the vertex of the parabola described by f(x) is located at (5,d), where d is a constant. Determine the values of b, c, and d. Find the range of the function based on the determined vertex and its direction. [/INST] To find the values of b, c, and d, we can use the given information about the points where the graph passes through and the vertex of the parabola.

First, let's use the point-slope form of a parabola to find the value of b:

x - 3 = (b/2)(x - 5)

Setting x = 3 in this equation, we have:

0 = (b/2)(2)

This implies that b = -2.

Now, we can use the vertex form of a parabola to find the values of c and d:

f(x) = a(x-h)² + k

where (h,k) is the vertex and a is the coefficient of x² (which we do not know yet).

Setting x = 5 and f(x) = 10 in this equation, we have:

10 = a(5-3)² + k
10 = a(2)² + k

To find a

In [11]:
# combine the prompt and generation
prompt = messages[0]['content']
completion = decoded[0]

combined_messages = [
    {"role": "user", "content": prompt},
    {"role": "assistant", "content": completion}
]

combined_encodeds = tokenizer.apply_chat_template(combined_messages, return_tensors="pt")

model_inputs = combined_encodeds.to(device)

logits = model.forward(model_inputs).logits

In [13]:
logits.shape

torch.Size([1, 940, 32000])

In [19]:
logits

tensor([[[ -5.5489,  -5.5728,  -0.2506,  ...,  -4.1927,  -3.2591,  -3.8910],
         [ -7.9946,  -7.7933,  -3.1488,  ...,  -3.3826,  -3.5585,  -2.0075],
         [ -4.3011,  -4.8040, -11.4249,  ...,  -0.6552,  -0.6087,   0.5439],
         ...,
         [ -7.6381,  -7.2220,  17.2601,  ...,  -6.3987,  -5.2656,  -1.9171],
         [ -4.5062,  -0.6695,   6.5561,  ...,  -3.6560,  -0.4891,   0.6460],
         [ -4.2621,  -0.0362,   5.0418,  ...,  -4.0466,  -0.4695,   0.6488]]],
       device='cuda:0', grad_fn=<UnsafeViewBackward0>)

In [18]:
# check the shape of the tokenized combined prompt
combined_encodeds.shape

torch.Size([1, 940])

In [20]:
combined_encodeds

tensor([[    1,   733, 16289, 28793, 11772,   264,   908,   285, 28732, 28744,
         28731,  4057,   390,   285, 28732, 28744,  6706, 28744, 28815, 28750,
         28806, 28726, 28744, 28806, 28717,   970,   287,   304,   277,   460,
         19648, 28723,   415,  5246,   302,   285, 28732, 28744, 28731, 15167,
          1059,   272,  1305,   325, 28770, 28725, 28740, 28734,   609, 16569,
         28725,   378,   349,  2651,   369,   272, 14183,   302,   272,   940,
           375,  4362,  5397,   486,   285, 28732, 28744, 28731,   349,  5651,
           438,   325, 28782, 28725, 28715,   557,   970,   281,   349,   264,
          5957, 28723,  5158, 21824,   272,  3069,   302,   287, 28725,   277,
         28725,   304,   281, 28723,  8769,   272,  2819,   302,   272,   908,
          2818,   356,   272,  7798, 14183,   304,   871,  5007, 28723,   733,
         28748, 16289, 28793,     1, 28705,   733, 16289, 28793, 11772,   264,
           908,   285, 28732, 28744, 28731,  4057,  

In [23]:
tokenizer.batch_decode(combined_encodeds[:,:10]), tokenizer.batch_decode(combined_encodeds[:,-10:])

(['<s> [INST] Consider a function f(x'], ['is [10, ∞).</s></s>'])

In [27]:
tokenizer.batch_decode(torch.argmax(logits,axis=-1))

["Questionpackageer:ing simple f(x) = as:(x)x^2+2x+c, b and c are constants. Find function of this(x) passes through the point (1,12). How, the is known that the vertex of the parabola lies by f(x) lies located at the1,- -). where d is a constant.\nermine the values of b and c, and d.\n the equation of x function. on the given values. the x.\nTo basedFirst To Question1To0 To the function f(x) defined as f(x)x^2+bx+c. b and c are constants. We graph of f(x) passes through the point (3,10). We, it is known that the vertex of the parabola described by f(x) is located at (5,d). where d is a constant.\nerm the values of b, c, and d.\n the range of the function based on the determined vertex and its direction.\n\n\nFirst To find the values of b, c, and d, we can use the given information about the point where the par passes through and the location. the parabola.\n\nFirst, we's find the point (vertexlope form of a parabola to find the value of b:\n\nThe - 3 = (x/2)(x - 5)\n\nSetting x = 3 in