In [1]:
# Setup the environment
#!pip install --upgrade huggingface_hub transformers
!pip install bitsandbytes
from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
access_token_read = UserSecretsClient().get_secret("HUGGINGFACE_TOKEN")
login(token = access_token_read)
#!pip install git+https://github.com/huggingface/transformers -U
#!pip install accelerate
#!pip install -i https://pypi.org/simple/ bitsandbytes

Collecting bitsandbytes
  Downloading bitsandbytes-0.43.1-py3-none-manylinux_2_24_x86_64.whl.metadata (2.2 kB)
Downloading bitsandbytes-0.43.1-py3-none-manylinux_2_24_x86_64.whl (119.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m13.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.43.1


BackendError: Unexpected response from the service. Response: {'errors': ['No user secrets exist for kernel id 54693505 and label HUGGINGFACE_TOKEN.'], 'error': {'code': 5, 'details': []}, 'wasSuccessful': False}.

In [2]:
import torch, os
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
# Work around a bug in the version of PyTorch and GPU hardware curretnly on Kaggle. On other hardware, removing these lines may lead to a speed-up.
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)

# Load the model
USE_INSTRUCTION_TUNED = True 
if USE_INSTRUCTION_TUNED:
    model_name = '/kaggle/input/gemma/transformers/1.1-2b-it/1'
    if not os.path.exists(model_name):
        print("Warning: loading model weights from the Internet. This might take a bit of extra time.")
        model_name = "google/gemma-1.1-2b-it"
else:
    model_name = "/kaggle/input/gemma/transformers/2b/2"
    if not os.path.exists(model_name):
        print("Warning: loading model weights from the Internet. This might take a bit of extra time.")
        model_name = "google/gemma-2b"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map='auto',
    torch_dtype=torch.bfloat16)
streamer = TextStreamer(tokenizer)
# Silence a warning.
tokenizer.decode([tokenizer.eos_token_id]);

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

2024-04-19 00:20:54.728723: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-19 00:20:54.728825: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-19 00:20:54.861344: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [3]:
# Check where the whole model is loaded and what data type it's using.
model.device, model.dtype

(device(type='cuda', index=0), torch.bfloat16)

In [4]:
# Check where parameters are loaded. If this is anything other than {'': 0}
# then probably some parts of the model got offloaded onto CPU and so will run slow.
model.hf_device_map

{'': 0}

## Warm-Up

In [5]:
%%time
doc = '''I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. Let's think step by step. After I bought the apples, I had'''
model_out = model.generate(
    **tokenizer(doc, return_tensors='pt').to(model.device),
    max_new_tokens=128,
    do_sample=False,
    streamer=streamer)

<bos>I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. Let's think step by step. After I bought the apples, I had 10 apples in total. After I distributed the apples, I had 8 apples left. Then I bought 5 more apples and ate 1, leaving me with 7 apples.<eos>
CPU times: user 2.71 s, sys: 177 ms, total: 2.88 s
Wall time: 3.45 s


### Gemma Warm-Up Summary:
1. Gemma performed well on all of the tasks. For the capital of France, it gave several options but then chose the correct one and explained it more. The Spanish translation about statistical models seemed slightly off but when I gave it simpler examples it did better. The math question was correct and the programming task also executed well with explanation along with it.
2. You have to think about how to word your statement so that it is completable. It seems that most instructions could be written in this way but it will take more thought.
3. There is a larger ammount of first options but once something inital has been chosen then the possibility of next options goes down so it can chose faster.

#### Few-Shot Learning:
When I turned USE_INSTRUCTION_TUNED to False the output started repeating information more. It also responded differently based on if there was a space left at the end or if there was punctuation. I seemed that the information was mostly correct but it was also a bit vague.

#### Chain of Thought:
Using the first sentence about apples, Gemma did well, giving the logic and arithmatic it went through to find the answer. When I changed the ending, however, the output just started repeating the input like in a loop.

## Chat Templating

In [6]:
role = """You are a student in an African history class"""
task = """Write a one-page essay based on the book Things Fall Apart by Chinua Achebe that discusses the relationship between Okonkwo and Ikemefuna."""

messages = [
    {
        "role": "user",
        "content": f"{role}\n\n{task}",
    },
 ]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
print(tokenizer.batch_decode(tokenized_chat)[0])
print(model.generate(tokenized_chat.to(model.device), max_new_tokens=500,
    do_sample=False,
    streamer=streamer)[0])


<bos><start_of_turn>user
You are a student in an African history class

Write a one-page essay based on the book Things Fall Apart by Chinua Achebe that discusses the relationship between Okonkwo and Ikemefuna.<end_of_turn>
<start_of_turn>model

<bos><start_of_turn>user
You are a student in an African history class

Write a one-page essay based on the book Things Fall Apart by Chinua Achebe that discusses the relationship between Okonkwo and Ikemefuna.<end_of_turn>
<start_of_turn>model
**The Complex Relationship between Okonkwo and Ikemefuna in Things Fall Apart**

Chinua Achebe's seminal novel, "Things Fall Apart," explores the intricate and often fraught relationship between Okonkwo, the powerful leader of the Igbo society, and Ikemefuna, a respected elder. Their connection transcends mere kinship, evolving into a complex interplay of respect, rivalry, and ultimately, tragic conflict.

Okonkwo's authority stems from his wisdom, strength, and connection to the earth. He is a symbol of

### Conversations as Documents
1. The model makes the user put in a role or specific viewpoint the task should be answered from. The model requires that the user puts in both a role for the answer to play, and a task for the model to do.
2. The model would look for examples and data that reflects the way a helpful 2nd grade teacher would think. Then when it looks at probability for the next token, it compares those probabilities to the 2nd grade teacher examples.
3. The model predicts that a correct answer about the color of the sky.

## Retrieval-Augmented Generation

In [7]:
import inspect
docstrings = {}
for name, obj in inspect.getmembers(torch.nn):
    if inspect.isfunction(obj) or inspect.isclass(obj):
        docstrings[name] = inspect.getdoc(obj)


In [8]:
docstrings.keys()

dict_keys(['AdaptiveAvgPool1d', 'AdaptiveAvgPool2d', 'AdaptiveAvgPool3d', 'AdaptiveLogSoftmaxWithLoss', 'AdaptiveMaxPool1d', 'AdaptiveMaxPool2d', 'AdaptiveMaxPool3d', 'AlphaDropout', 'AvgPool1d', 'AvgPool2d', 'AvgPool3d', 'BCELoss', 'BCEWithLogitsLoss', 'BatchNorm1d', 'BatchNorm2d', 'BatchNorm3d', 'Bilinear', 'CELU', 'CTCLoss', 'ChannelShuffle', 'CircularPad1d', 'CircularPad2d', 'CircularPad3d', 'ConstantPad1d', 'ConstantPad2d', 'ConstantPad3d', 'Container', 'Conv1d', 'Conv2d', 'Conv3d', 'ConvTranspose1d', 'ConvTranspose2d', 'ConvTranspose3d', 'CosineEmbeddingLoss', 'CosineSimilarity', 'CrossEntropyLoss', 'CrossMapLRN2d', 'DataParallel', 'Dropout', 'Dropout1d', 'Dropout2d', 'Dropout3d', 'ELU', 'Embedding', 'EmbeddingBag', 'FeatureAlphaDropout', 'Flatten', 'Fold', 'FractionalMaxPool2d', 'FractionalMaxPool3d', 'GELU', 'GLU', 'GRU', 'GRUCell', 'GaussianNLLLoss', 'GroupNorm', 'Hardshrink', 'Hardsigmoid', 'Hardswish', 'Hardtanh', 'HingeEmbeddingLoss', 'HuberLoss', 'Identity', 'InstanceNorm1

In [9]:
role = """You are an expert in PyTorch"""
task = """Explain the options for TripletMarginLoss in PyTorch."""
context = docstrings['CrossEntropyLoss']
messages = [
    {
         "role": "user",
        "content": f"{role}\n\n{task}",

        "Answer using the following context": f"{context}\n\n"
    },
 ]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
print(tokenizer.batch_decode(tokenized_chat)[0])
print(model.generate(tokenized_chat.to(model.device), max_new_tokens=128,
    do_sample=False,
    streamer=streamer)[0])


<bos><start_of_turn>user
You are an expert in PyTorch

Explain the options for TripletMarginLoss in PyTorch.<end_of_turn>
<start_of_turn>model

<bos><start_of_turn>user
You are an expert in PyTorch

Explain the options for TripletMarginLoss in PyTorch.<end_of_turn>
<start_of_turn>model
**TripletMarginLoss in PyTorch**

TripletMarginLoss is a loss function that measures the distance between a triplet of data points. It encourages the model to learn embeddings that are close to each other while pushing them apart from distant pairs.

**Options for TripletMarginLoss:**

**1. Margin Parameter:**
- `margin`: The distance between the closest and farthest points in the triplet.
- Valid range: `(0, 1)`.

**2. Temperature:**
- Controls the spread of the margin distribution.
- Higher temperature leads to a wider margin but may increase variance.

**
tensor([     2,    106,   1645,    108,   2045,    708,    671,  13865,    575,
          8745, 111226,    109,  74198,    573,   4561,    604,   62

### Retrieval-Augmented Generation Questions
1. Without context, the model just spews out anything that is not acurate. With the content added it still said the same thing.
2. It is using an imported model, it is using instruction tuning.
3. They still both give the same answer.

In [10]:
role = """You are an expert in PyTorch"""
task = """Explain the options for TripletMarginLoss in PyTorch."""
context = docstrings['CrossEntropyLoss']
messages = [
    {
        "role": "user",
        "content": f"What is the sum of the odd numbers less than 20?",
    },
    {
        "role": "assistant",
        "content": f"""
Run Python code: print(sum(x for x in range(20) if x % 2 == 1))
Code output: 100

The result is 100."""
    },
    {
        "role": "user",
        "content": f"What is the sum of the even numbers less than 40?",
    },
 ]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
print(tokenizer.batch_decode(tokenized_chat)[0])
print(model.generate(tokenized_chat.to(model.device), max_new_tokens=128,
    do_sample=False,
    streamer=streamer)[0])


<bos><start_of_turn>user
What is the sum of the odd numbers less than 20?<end_of_turn>
<start_of_turn>model
Run Python code: print(sum(x for x in range(20) if x % 2 == 1))
Code output: 100

The result is 100.<end_of_turn>
<start_of_turn>user
What is the sum of the even numbers less than 40?<end_of_turn>
<start_of_turn>model

<bos><start_of_turn>user
What is the sum of the odd numbers less than 20?<end_of_turn>
<start_of_turn>model
Run Python code: print(sum(x for x in range(20) if x % 2 == 1))
Code output: 100

The result is 100.<end_of_turn>
<start_of_turn>user
What is the sum of the even numbers less than 40?<end_of_turn>
<start_of_turn>model
```python
sum = 0
for x in range(40):
    if x % 

KeyboardInterrupt: 

### Tool Use
1. It can give more logical answers instead of trying to just write out paragraphs explaining the arithematic it needs to do.
2. You have to teach the model to use the tool only in certain cases. The model might try to use the tool in ways its not created to leading to inacurate output.

### Calvin Chatbot:
1. How do I find my classrooms? What Intramurals are there? Who do I need to talk to if I need to add/drop a class?
2. I would make the role be "a helpful chatbot that answers questions from incoming freshman and transfer students". The task would be the list of questions that the incoming students might have. I would also use the Calvin website as context for the model to give information about.