# Understanding Large Language Models: A Step-by-Step Journey

## Our Goal
In this notebook, we will understand how language models predict text by following the complete process of predicting "we love deep learning" word by word. We'll explore four fundamental concepts that make modern language models like ChatGPT work:

1. **Forward Pass**: How models generate predictions
2. **Loss Calculation**: How we measure prediction quality  
3. **Backpropagation**: How we identify what needs improvement
4. **Gradient Descent**: How we make those improvements

## Our Vocabulary and Target
We'll work with a simple vocabulary to keep things clear and manageable. Our model will learn to predict each word in our target sequence step by step.

In [ ]:
# Setup our simple vocabulary and target sequence
VOCAB = ["<BOS>", "we", "love", "deep", "learning", "<EOS>", "the", "is", "great", "model", "hello", "world"]
target_sequence = ["we", "love", "deep", "learning"]

print("VOCABULARY:", VOCAB)
print("TARGET SEQUENCE:", target_sequence)
print("VOCABULARY SIZE:", len(VOCAB))

# Initialize simple model parameters (weights) - these will be updated during training
# In real models, these would be millions or billions of parameters
model_parameters = {
    'layer1_weights': [0.1, -0.3, 0.5, 0.2, -0.1, 0.4, 0.8, -0.2, 0.3, -0.5, 0.7, -0.4],
    'layer2_weights': [0.2, 0.1, -0.4, 0.6, 0.3, -0.2, -0.1, 0.5, -0.3, 0.4, -0.6, 0.1],
    'output_weights': [0.3, -0.1, 0.4, -0.2, 0.5, 0.1, -0.3, 0.2, 0.6, -0.4, 0.1, 0.3]
}

print("\nInitial model parameters (simplified representation):")
print("Layer 1 weights:", len(model_parameters['layer1_weights']), "parameters")
print("Layer 2 weights:", len(model_parameters['layer2_weights']), "parameters") 
print("Output weights:", len(model_parameters['output_weights']), "parameters")