<a href="https://colab.research.google.com/github/urness/CS167Fall2025/blob/main/Day25_Intro_to_Transformers_Part2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CS167: Day25
## Intro to Transformers part 2

#### CS167: Machine Learning, Fall 2025



__Credit__:

Much of the code and lecture materials used from [The Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/)

Free online course: [How Transformers Work](https://learn.deeplearning.ai/courses/how-transformer-llms-work)


## __Put the Model on Training Device (GPU or CPU)__


It's not necessary to have GPU for this notebook. However, it won't hurt.
We want to accelerate the training process using graphical processing unit (GPU). Fortunately, in Colab we can access for GPU. You need to enable it from _Runtime (or click on the down arrow near RAM & DISK in upper right)-->Change runtime type-->GPU or TPU_

Professor Urness tested this code with the GPU option: T4

In this lesson, you will reinforce your understanding of the transformer architecture by exploring the decoder-only [model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) `microsoft/Phi-3-mini-4k-instruct`.

## Setup

We start with setting up the lab by installing the required libraries (`transformers` and `accelerate`) and ignoring the warnings.

Some necessary import statements:

In [None]:
# !pip install transformers>=4.41.2 accelerate>=0.31.0

# Warning control
import warnings
warnings.filterwarnings('ignore')

## Loading the LLM

Let's first load the model and its tokenizer. For that you will first import the classes: `AutoModelForCausalLM` and `AutoTokenizer`. When you want to process a sentence, you can apply the tokenizer first and then the model in two separate steps. Or you can create a pipeline object that wraps the two steps and then apply the pipeline to the sentence. You'll explore both approaches in this notebook. This is why you'll also import the `pipeline` class.

In [None]:
# import the required classes
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, set_seed

In [None]:
# Load model and tokenizer
model_name = "gpt2"  # 124M parameters
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct", trust_remote_code=True)
# model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct", trust_remote_code=True)

Now you can wrap the model and the tokenizer in a [pipeline](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline) object that has "text-generation" as task.

In [None]:

# Create a pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False, # False means to not include the prompt text in the returned text
    max_new_tokens=50,
    do_sample=False
    )

## Generating a Text Response to a Prompt

You'll now use the pipeline object (labeled as generator) to generate a response consisting of 50 tokens to the given prompt.

In [None]:
prompt = "Write an email apologizing to Sarah for the tragic gardening mishap. Explain how it happened. "
output = generator(prompt)

print(output[0]['generated_text'])

## Exploring the Model's Architecture

You can print the model to take a look at its architecture.


In [None]:
model

The vocabulary size is 50257 tokens, and the size of the vector embedding for each token is 768.

In [None]:
model.transformer.wte

- 50257 is the vocabulary size; GPT-2 has 50,257 unique tokens in its tokenizer
- 768 is the model dimension — each token is represented as a 768-dimensional vector in GPT-2 “small”

In [None]:
model.transformer.wpe

- 1024	is the maximum context length (max sequence of tokens GPT-2 can “see”)
- 768	again, the model dimension

In [None]:
model.transformer.h

Number of transformer layers: 12

# Exercise
Load the model 'gpt2-large'
print out the model tranformer specifications, as above.
What is
1. The vocabulary size
2. The model dimension
3. The maximum context length
4. The number of layers of transformers

In [None]:
# Load model and tokenizer
model_name = "gpt2-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

In [None]:
model
