<h1>Chapter 1 - Introduction to Language Models</h1>
<i>Exploring the exciting field of Language AI</i>


<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961"><img src="https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon"></a>
<a href="https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/"><img src="https://img.shields.io/badge/O'Reilly-white.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMzQiIGhlaWdodD0iMjciIHZpZXdCb3g9IjAgMCAzNCAyNyIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTMiIGN5PSIxNCIgcj0iMTEiIHN0cm9rZT0iI0Q0MDEwMSIgc3Ryb2tlLXdpZHRoPSI0Ii8+CjxjaXJjbGUgY3g9IjMwLjUiIGN5PSIzLjUiIHI9IjMuNSIgZmlsbD0iI0Q0MDEwMSIvPgo8L3N2Zz4K"></a>
<a href="https://github.com/HandsOnLLM/Hands-On-Large-Language-Models"><img src="https://img.shields.io/badge/GitHub%20Repository-black?logo=github"></a>
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter01/Chapter%201%20-%20Introduction%20to%20Language%20Models.ipynb)

---

This notebook is for Chapter 1 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).

---

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961">
<img src="https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png" width="350"/></a>


### [OPTIONAL] - Installing Packages on <img src="https://colab.google/static/images/icons/colab.png" width=100>

If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:

---

💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to
**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.

---

In [None]:
# %%capture
# !pip install transformers>=4.40.1 accelerate>=0.27.2

# Phi-3

The first step is to load our model onto the GPU for faster inference. Note that we load the model and tokenizer separately (although that isn't always necessary).'

PyTorch is an open-source machine learning library developed by Facebook’s AI Research lab (FAIR). It's widely used for:

Deep learning

Natural language processing (NLP)

Computer vision

Reinforcement learning

It’s one of the most popular frameworks alongside TensorFlow, and it's especially favored in research and production for its simplicity and flexibility.

🔧 What is PyTorch used for?
PyTorch is used to:

Build and train neural networks

Handle automatic differentiation (via autograd)

Run models on GPU for acceleration

Work with tensors (multi-dimensional arrays, like NumPy)

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
# AutoModelForCausalLM.from_pretrained loads a causal language model (like GPT) using Hugging Face’s AutoModelForCausalLM class.
#"from_pretrained" means it fetches the model weights and configuration from the Hugging Face Model Hub.
# AutoModelForCausalLM automatically selects the correct architecture class for the given model name (in this case, Phi-3).
# "microsoft/Phi-3-mini-4k-instruct" This is the model ID on the Hugging Face Model Hub. It's a lightweight Phi-3 model (~1.8B parameters) designed for instruction-following, supporting ~4,000 tokens of context.
# device_map="cuda" This tells Transformers to load the model on a CUDA-enabled GPU (if available). "cuda" is shorthand for the first GPU device (cuda:0).
# torch_dtype="auto" Lets Transformers automatically choose the most optimal PyTorch tensor data type, usually float16 if GPU supports it (better memory usage and speed), otherwise float32.
# trust_remote_code=False For safety: ensures that only built-in architectures are used. If the model required custom Python code (e.g., from a third-party), this would need to be True.
# False helps prevent the execution of potentially untrusted or unsafe code.


model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=False,
)
#AutoTokenizer.from_pretrained(...) AutoTokenizer is a Hugging Face class that automatically loads the correct tokenizer for the given model.
# from_pretrained(...) loads a pretrained tokenizer configuration and vocabulary for a specific model.
# "microsoft/Phi-3-mini-4k-instruct" This is the name of the model repository on the Hugging Face Hub. It tells the method where to fetch the tokenizer files from.
# Hugging Face ensures the tokenizer loaded here is 100% compatible with the model weights.
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

Although we can now use the model and tokenizer directly, it's much easier to wrap it in a `pipeline` object:

In [None]:
from transformers import pipeline

# Create a pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False
)

Finally, we create our prompt as a user and give it to the model:

In [None]:
# The prompt (user input / query)
messages = [
    {"role": "user", "content": "Create a funny joke about chickens."}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])

 Why did the chicken join the band? Because it had the drumsticks!
