<a href="https://colab.research.google.com/github/yxxue/Hands-On-Large-Language-Models/blob/main/chapter01/Chapter%201%20-%20Introduction%20to%20Language%20Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Chapter 1 - Introduction to Language Models</h1>
<i>Exploring the exciting field of Language AI</i>


<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961"><img src="https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon"></a>
<a href="https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/"><img src="https://img.shields.io/badge/O'Reilly-white.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMzQiIGhlaWdodD0iMjciIHZpZXdCb3g9IjAgMCAzNCAyNyIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTMiIGN5PSIxNCIgcj0iMTEiIHN0cm9rZT0iI0Q0MDEwMSIgc3Ryb2tlLXdpZHRoPSI0Ii8+CjxjaXJjbGUgY3g9IjMwLjUiIGN5PSIzLjUiIHI9IjMuNSIgZmlsbD0iI0Q0MDEwMSIvPgo8L3N2Zz4K"></a>
<a href="https://github.com/HandsOnLLM/Hands-On-Large-Language-Models"><img src="https://img.shields.io/badge/GitHub%20Repository-black?logo=github"></a>
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter01/Chapter%201%20-%20Introduction%20to%20Language%20Models.ipynb)

---

This notebook is for Chapter 1 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).

---

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961">
<img src="https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png" width="350"/></a>


### [OPTIONAL] - Installing Packages on <img src="https://colab.google/static/images/icons/colab.png" width=100>

If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:

---

💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to
**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.

---

In [1]:
# %%capture
!pip install transformers>=4.40.1 accelerate>=0.27.2

# Phi-3

The first step is to load our model onto the GPU for faster inference. Note that we load the model and tokenizer separately (although that isn't always necessary).

A casual model means it predicts the next word or token in a sequence based *only* on the preceding words or token. It's commonly used for generating text, completing sentences or write content.

Besides it, there are other types like:
- Masked Language Models (MLMs): These models predict missing words in a sentence based on the context of the words around them (like filling in the blanks). BERT is a famous example of an MLM.

- Sequence-to-Sequence Models: These models are designed to transform one sequence of text into another, and are often used for tasks like machine translation or text summarization.

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=False,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

Although we can now use the model and tokenizer directly, it's much easier to wrap it in a `pipeline` object:

A pipeline is a high level helper function that make it easy to use pre-trained model for specific tasks without writing a lot code. You can consider it as a complete system for a task.

In [3]:
from transformers import pipeline

# Create a pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False
)

Device set to use cuda
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Finally, we create our prompt as a user and give it to the model:

In [7]:
# The prompt (user input / query)
messages = [
    {"role": "user", "content": "do you know which LLM model you are using?"}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


 Yes, I'm using Microsoft's GPT (Generative Pre-trained Transformer) model.


this is an example if i dont use pipeline function.

In [9]:
# Assuming 'model' and 'tokenizer' are already loaded from the previous cell

# The prompt
messages = [
    {"role": "user", "content": "do you know which LLM model you are using?"}
]

# Manually prepare the input for the model
# Apply the chat template if the tokenizer has one
if hasattr(tokenizer, 'apply_chat_template') and tokenizer.chat_template:
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
else:
    # Fallback for tokenizers without a chat template
    # This might need adjustment based on the specific model's expected input format
    prompt = messages[0]['content'] # Simple example, may not work for all models

# Tokenize the prompt
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") # Move to GPU

# Generate tokens using the model
# You would typically use model.generate() for more control over generation parameters
# This is a simplified example
#with torch.no_grad(): # Disable gradient calculation for inference
outputs = model.generate(**inputs, max_new_tokens=500, do_sample=False)


# Decode the generated tokens back to text
# Need to skip the input tokens if add_generation_prompt was True or similar
# Adjusting slicing based on whether the input prompt is included in the output
if hasattr(tokenizer, 'apply_chat_template') and tokenizer.chat_template:
     # Find the start of the generated text if a generation prompt was added
     decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
     # Attempt to find the split point after the prompt based on token length or a separator
     # This part can be tricky and model-specific
     # For simplicity, let's assume the output starts after the input tokens + prompt tokens
     input_len = inputs['input_ids'].shape[1]
     generated_text = tokenizer.decode(outputs[0, input_len:], skip_special_tokens=True)

else:
    # Simple decoding for models without chat template or complex prompting
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)


# Print the generated text
print(generated_text)

Yes, I'm using Microsoft's GPT (Generative Pre-trained Transformer) model.
