#  🧠 Python Basics for LLM Development
Welcome! In this notebook, you'll practice the Python fundamentals—variables, control flow, loops, and functions—that you'll frequently use when building and experimenting with language‑model projects.

## Learning Objectives
By the end of this notebook you will be able to:
1. Work confidently with Python variables and data structures.
2. Write conditional logic and loops for automation.
3. Encapsulate reusable code in functions (with docstrings & type hints).
4. Use popular data libraries (NumPy, Pandas) to prepare model inputs.
5. Make your **first call** to an LLM API (OpenAI) and inspect its response.

Feel free to **run** each cell (Shift‑Enter) and **edit** the code to test your understanding.

## 0. Quick Warm‑Up

In [1]:
print('Hello, LLM world!')

Hello, LLM world!


## 1. Variables & Data Types

Without type hints

In [2]:
# Basic types
name = 'Ada Lovelace'
age = 27
height_m = 1.65
is_data_scientist = True

print(name, age, height_m, is_data_scientist)

Ada Lovelace 27 1.65 True


In [3]:
# You can use type() to get the type of a variable
print(type(name))   # is a string
print(type(height_m))   # is a float
print(type(is_data_scientist))   # is a bool

<class 'str'>
<class 'float'>
<class 'bool'>


Note that these are `class`es. In python, everything is an object.

Python is dynamically typed, but you can add type hints for clarity (handy for larger codebases & editors).

In [4]:
# Basic types
name: str = 'Ada Lovelace'
age: int = 27
height_m: float = 1.65
is_data_scientist: bool = True

print(name, age, height_m, is_data_scientist)

Ada Lovelace 27 1.65 True


#### 🏗️ Activity 1:

1. Create two variables `tokens` (int) and `model_name` (str).
2. Print a sentence like _"The model gpt‑4 will process 1536 tokens."_

Run the cell below and write your answer in the space provided.

In [5]:
# Your code here ➡️

## 2. Sequences: Lists, Tuples & Dicts

In [6]:
# List of prompt strings
prompts = ['Translate to French:', 'Summarize:', 'Explain like I am five:']
print(prompts[0])

# Tuple (immutable ordered collection)
dimensions = (768, 1024)
width, height = dimensions
print(f'Width={width}, Height={height}')

# Dict mapping role → content
message = {'role': 'system', 'content': 'You are a helpful assistant.'}
print(message)

Translate to French:
Width=768, Height=1024
{'role': 'system', 'content': 'You are a helpful assistant.'}


#### 🏗️ Activity 2:

*Create a dictionary called `hyperparams` that stores a `learning_rate` of 3e‑4, `batch_size` of 8, and `num_epochs` of 2.*

In [7]:
# Your code here ➡️

## 3. Control Flow

### 3.1. Conditional statements

In [8]:
temperature = 0.7
if temperature > 1.0:
    print('The model will be quite creative!')
elif temperature > 0.0:
    print('A balanced amount of randomness.')
else:
    print('Deterministic output.')

A balanced amount of randomness.


#### 🏗️ Activity 3:

*Write code that prints **"even"** if a number is divisible by 2, otherwise **"odd"**.*

In [9]:
# Your code here ➡️

### 3.2 Loops & Comprehensions

In [10]:
# For‑loop
for i in range(3):
    print(f'Message {i+1}: {prompts[i]} write a haiku')

# While‑loop (basic)
n = 1
while n < 3:
    print('n is still less than 3')
    n += 1

# List comprehension
token_counts = [len(p.split()) for p in prompts]
token_counts

Message 1: Translate to French: write a haiku
Message 2: Summarize: write a haiku
Message 3: Explain like I am five: write a haiku
n is still less than 3
n is still less than 3


[3, 1, 5]

#### 🏗️ Activity 4:

*Use a list comprehension to generate the squares of numbers 0‑9.*

In [11]:
# Your code here ➡️

### 3.3 Functions

In [12]:
def count_tokens(strings: list[str]) -> int:
    """Return the total number of whitespace-separated tokens in a list of strings."""
    return sum(len(s.split()) for s in strings)

count_tokens(prompts)

9

#### 🏗️ Activity 5:
 
*Write a function `calculate_average(numbers: List[float]) -> float` that returns the average of a list of numbers.*

In [13]:
# Your code here ➡️

### 3.4 Exception handling

Exception handling in Python is a way to manage errors that occur during program execution without crashing the program. It uses the try, except, and optionally finally blocks to catch and respond to unexpected situations, such as dividing by zero or accessing a missing file.

Code that might raise an error is placed inside a `try` block; if an error occurs, the program jumps to the `except` block, where you can define how to handle it.

A `finally` block can be added to run code that must execute no matter what, such as closing files or cleaning up resources. This makes your programs more robust and user-friendly.










🔧 What Do They Do?

* `try`: Block of code where something might go wrong.

* `except`: Catches and handles errors (exceptions) if they occur in the try block.

* `finally`: Code here always runs, whether an exception occurred or not—often used for cleanup (like closing files or releasing resources).

In [14]:
try:
    x = 10 / 0
except ZeroDivisionError:
    print("You can't divide by zero!")
finally:
    print("This runs no matter what.")

You can't divide by zero!
This runs no matter what.


In [15]:
try:
    x = 10 / 5
except ZeroDivisionError:
    print("You can't divide by zero!")
finally:
    print("This runs no matter what.")

This runs no matter what.


### 3.5 Async

`async` enables **asynchronous programming**, allowing a program to run tasks without waiting for each one to finish before starting the next.

Using the `async` and `await` keywords, you can define and pause asynchronous functions so that other operations can run while waiting—such as reading files, making web requests, or querying databases. This helps improve efficiency, especially in I/O-bound programs, by preventing the application from getting "stuck" during slow operations.

In [1]:
import asyncio, nest_asyncio
import time

nest_asyncio.apply()   # This is needed to run async code in Jupyter notebooks


async def task(name, delay):
    print(f"{name} started")
    await asyncio.sleep(delay)
    print(f"{name} finished after {delay} seconds")


async def main():
    start = time.time()

    # Run tasks concurrently
    await asyncio.gather(
        task("Task 1", 2),
        task("Task 2", 3),
        task("Task 3", 1)
    )

    end = time.time()
    print(f"All tasks completed in {end - start:.2f} seconds")

asyncio.run(main())

Task 1 started
Task 2 started
Task 3 started
Task 3 finished after 1 seconds
Task 1 finished after 2 seconds
Task 2 finished after 3 seconds
All tasks completed in 3.01 seconds


💡 Why This Is Better:

* `asyncio.gather()` runs all tasks concurrently, not one after the other.

* Even though the total delays add up to 6 seconds (2 + 3 + 1), the program finishes in just ~3 seconds, showing that tasks ran in parallel (non-blocking).

## 4. Working with Libraries — NumPy & Pandas (very small taste)

In [16]:
import numpy as np
import pandas as pd

# NumPy array
arr = np.array([1, 2, 3])
print('NumPy mean:', arr.mean())

# Pandas DataFrame
df = pd.DataFrame({'prompt': prompts, 'tokens': token_counts})
df

NumPy mean: 2.0


Unnamed: 0,prompt,tokens
0,Translate to French:,3
1,Summarize:,1
2,Explain like I am five:,5


#### 🏗️ Activity 6:

1. Add a new column `length` to `df` that contains the character length of each prompt.
2. Display the updated DataFrame.

In [17]:
# Your code here ➡️

## 5. Your First OpenAI Call 🌐
> _Skip if you don't have an API key yet._
>
> _If you do have an API key: Uncomment & set your environment variable._

In [18]:
#
# import os
# import openai
# from openai import OpenAI
# from dotenv import load_dotenv
# load_dotenv()

# openai.api_key = os.environ['OPENAI_API_KEY']

# #Then install openai (if needed) & make a minimal chat completion call
# #pip install --quiet openai


# client = OpenAI()

# response = client.chat.completions.create(
#     model='gpt-4o-mini',
#     messages=[{'role': 'user', 'content': 'Say hi!'}]
# )
# print(response.choices[0].message.content)


Define Helper functions

In [19]:
# from IPython.display import display, Markdown


# YOUR_PROMPT = "What is the difference between LangChain and LlamaIndex?"

# client.chat.completions.create(
#     model="gpt-4o-mini",
#     messages=[{"role" : "user", "content" : YOUR_PROMPT}]
# )

# def get_response(client: OpenAI, messages: str, model: str = "gpt-4.1-nano") -> str:
#     return client.chat.completions.create(
#         model=model,
#         messages=messages
#     )

# def system_prompt(message: str) -> dict:
#     return {"role": "developer", "content": message}

# def assistant_prompt(message: str) -> dict:
#     return {"role": "assistant", "content": message}

# def user_prompt(message: str) -> dict:
#     return {"role": "user", "content": message}

# def pretty_print(message: str) -> str:
#     display(Markdown(message.choices[0].message.content))

# messages = [user_prompt(YOUR_PROMPT)]

# chatgpt_response = get_response(client, messages)

# pretty_print(chatgpt_response)

#### 🏗️ Activity 7:

Create a basic chat function that asks the OpenAI model a simple question and prints the response.




## 🎉 Congratulations! 
You've covered Python basics you will use daily in LLM workflows. From here, you might explore:
* Prompt engineering patterns
* Tokenization & chunking strategies
* Vector embeddings & similarity search
* Building an end‑to‑end Retrieval‑Augmented Generation (RAG) pipeline

Happy coding 🚀

## 6. Function Deep Dive  🔍

Python functions are first‑class objects—you can pass them around just like variables. Let's explore optional arguments, `*args` / `**kwargs`, lambdas, and higher‑order functions.

In [20]:
from typing import Callable, Any

def greet(name: str, greeting: str = "Hello") -> str:
    """Return a friendly greeting."""
    return f"{greeting}, {name}!"

print(greet("Ada"))
print(greet("Grace", greeting="Howdy"))

# Variable‑length arguments
def avg(*numbers: float) -> float:
    return sum(numbers) / len(numbers)

print("Mean:", avg(1, 2, 3, 4, 5))

# Higher‑order: apply a function to every element
def apply(fn: Callable[[Any], Any], iterable):
    return [fn(x) for x in iterable]

apply(lambda x: x ** 2, range(5))

Hello, Ada!
Howdy, Grace!
Mean: 3.0


[0, 1, 4, 9, 16]

#### 🏗️ Activity 8:

Write a recursive function `factorial(n)` that returns `n!`. Test it for `n = 5`.

In [21]:
# Your code here ➡️

## 7. Classes & Objects  🏗️

Object‑oriented programming (OOP) lets you bundle data **and** behavior. Here's a minimal example and a modern `@dataclass`.

In [22]:
class PromptManager:
    """Store and manage prompt strings."""
    def __init__(self):
        self.prompts = []

    def add(self, text: str):
        self.prompts.append(text)

    def __len__(self):
        return len(self.prompts)

    def __repr__(self):
        return f"<PromptManager with {len(self)} prompts>"

pm = PromptManager()
pm.add('Explain reinforcement learning.')
pm.add('Translate to Czech.')
pm

<PromptManager with 2 prompts>

In [23]:
# Let's explore what we can do with the PromptManager instance
print(f"Number of prompts: {len(pm)}")
print(f"Prompts stored: {pm.prompts}")

# Add more prompts
pm.add('Generate a Python function')
pm.add('Debug this code snippet')

print(f"\nAfter adding more prompts: {pm}")
print(f"All prompts: {pm.prompts}")


Number of prompts: 2
Prompts stored: ['Explain reinforcement learning.', 'Translate to Czech.']

After adding more prompts: <PromptManager with 4 prompts>
All prompts: ['Explain reinforcement learning.', 'Translate to Czech.', 'Generate a Python function', 'Debug this code snippet']


In [24]:
from dataclasses import dataclass

@dataclass
class HyperParams:
    lr: float
    batch_size: int
    epochs: int = 3  # default

hp = HyperParams(lr=3e-4, batch_size=8)
print(hp)
print(hp.lr)

HyperParams(lr=0.0003, batch_size=8, epochs=3)
0.0003


#### 🏗️ Activity 9:
 
Create a class `TokenCounter` with a method `count(text)` that returns the number of whitespace‑separated tokens in `text`. Instantiate it and test on a sample sentence.

In [25]:
# Your code here ➡️

## 8. Looping Patterns & Comprehensions 🔄

In [26]:
items = ['apple', 'banana', 'cherry']
prices = [0.6, 0.3, 0.8]

# enumerate
for idx, item in enumerate(items, start=1):
    print(idx, item)

# zip
for item, price in zip(items, prices):
    print(f'{item} -> ${price:.2f}')

# Dictionary comprehension
price_map = {item: price for item, price in zip(items, prices)}
price_map

1 apple
2 banana
3 cherry
apple -> $0.60
banana -> $0.30
cherry -> $0.80


{'apple': 0.6, 'banana': 0.3, 'cherry': 0.8}

#### 🏗️ Activity 10:
 
Use a **set comprehension** to collect all unique lengths of words in `items`.

In [27]:
# Your code here ➡️

## 9. Decorators

A **decorator** in Python is a special function that modifies or enhances the behavior of another function or method without changing its actual code. It "wraps" the target function, allowing you to add extra functionality before or after the original function runs.

 Decorators are often used for tasks like logging, access control, timing, or caching. You apply a decorator using the `@decorator_name` syntax just above the function definition.
 
 This makes code cleaner, reusable, and easier to manage by separating concerns. You will encounter many decorators during the bootcamp.

In [28]:
def my_decorator(func):
    def wrapper(name):
        print("Before the function runs.")
        func(name)
        print("After the function runs.")
    return wrapper


✅ Explanation:

* The wrapper function takes an argument (name) and passes it to the original func.

* This way, the decorator works with functions that accept arguments.

In [29]:
@my_decorator
def say_hello(name):
    print(f"Hello, {name}!")

The function `say_hello` is decorated with `@my_decorator`.

Once a function is decorated, you can call it and it will run with the added behavior from the decorator.

In the current situation, the function's behavior is wrapped with two prints, one before and one after the function runs.

In [30]:
say_hello("Bob")

Before the function runs.
Hello, Bob!
After the function runs.


💡 Bonus Tip:

If you want your decorator to work with any number of arguments, use `*args` and `**kwargs`:


In [31]:
def my_decorator(func):
    def wrapper(*args, **kwargs):
        print("Before the function runs.")
        func(*args, **kwargs)
        print("After the function runs.")
    return wrapper

## 10. Mutability and immutability

### 10.1 Definition

In Python, mutability refers to whether an object’s value can be changed after it is created.

**Mutable objects** — like lists, dictionaries, and sets—can be modified in place, meaning you can change, add, or remove elements without creating a new object.

In contrast, **immutable** objects—such as integers, floats, strings, and tuples—cannot be altered once they are created; any operation that seems to modify them actually creates a new object. Understanding mutability is important because it affects how data is handled in memory and can influence the behavior of functions, especially when passing arguments.

### 10.2 In pratice

The cell bellows shows **mutability**: the list is modified in place, it remains the same object, stored in the same place in the memory (cf. id value).

In [32]:
my_list = [1, 2, 3]
print("Original ID:", id(my_list))

append_result = my_list.append(4)  # Modify the list in place and returns None
print("Modified List:", my_list)
print("ID After Modification:", id(my_list))  # Same ID
print("append_result is none:", (append_result is None))

Original ID: 4855032768
Modified List: [1, 2, 3, 4]
ID After Modification: 4855032768
append_result is none: True


The cell bellow shows **immutability**: a new string is created and referenced by the same variables. It's stored somewhere else in the memory (cf. id value).

In [33]:
my_str = "hello"
print("Original ID:", id(my_str))

my_str = my_str + " world"  # Create a new string
print("Modified String:", my_str)
print("ID After Modification:", id(my_str))  # Different ID

my_str = my_str.upper()
print("Modified String:", my_str)
print("ID After Modification:", id(my_str))  # Different ID

Original ID: 4407153968
Modified String: hello world
ID After Modification: 4862193840
Modified String: HELLO WORLD
ID After Modification: 4862193072


#### ✅ Summary:

**Mutable** → can change in place → same `id`

**Immutable** → changes create new object → different `id`

### 10.3 Consequences for function arguments

When a mutable is passed as argument to a function, it can be modified within this function, like the list below.

In [34]:
from typing import Any

input = [1, 2, 3]

def add_to_list(input: list, new_element: Any):
    input.append(new_element)

add_to_list(input, "hello")

print(input)

[1, 2, 3, 'hello']


`input` refers to the same object before and after the function call.

When an immutable is passed as an argument to a function, it cannot be modified with the function, like the string below. Try and guess what will happen when you run the cell?

In [35]:
input = "Hello"

def add_to_string(input, new_element: str):
    input += new_element

add_to_string(input, " World")

print(input)

Hello


With an immutable, it is required to return a new immutable and modify the variable assignment. Even though `input` is the same name, it refers to two different objects before and after the function call.

In [36]:
input = "Hello"
print("ID: ", id(input))

def add_to_string(input, new_element: str):
    input += new_element
    return input

input = add_to_string(input, " World")

print(input)
print("ID: ", id(input))

ID:  4842806832
Hello World
ID:  4848585328


## 11. Pandas DataFrames — A Deeper Look 🐼

In [37]:
import pandas as pd

# Create a DataFrame from a dictionary
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 30, 35, 40],
    'city': ['New York', 'London', 'Paris', 'Tokyo'],
    'score': [85, 92, 78, 95],
    'date': ['2023-01-15', '2023-02-20', '2023-03-10', '2023-04-05']
}

df = pd.DataFrame(data)

# Saving with specific options
df.to_csv('student_data.csv',
          index=False,  # don't save index
          date_format='%Y-%m-%d')  # format dates

# Reading with specific options
df_loaded = pd.read_csv('student_data.csv',
                        index_col='date',
                        parse_dates=True)  # parse date columns

df_loaded.head()

# Display basic info and summary statistics
print('DataFrame Info:')
print(df.info())
print('\nSummary Statistics:')
print(df.describe())

# Group by city and calculate mean score
city_avg = df.groupby('city')['score'].mean()
print('\nAverage Scores by City:')
print(city_avg)

# Create a pivot table
pivot = df.pivot_table(index='city', values=['age', 'score'], aggfunc='mean')
print('\nPivot Table:')
print(pivot)

# Transpose the pivot table
print('\nTransposed Pivot Table:')
print(pivot.T)



DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    4 non-null      object
 1   age     4 non-null      int64 
 2   city    4 non-null      object
 3   score   4 non-null      int64 
 4   date    4 non-null      object
dtypes: int64(2), object(3)
memory usage: 292.0+ bytes
None

Summary Statistics:
             age      score
count   4.000000   4.000000
mean   32.500000  87.500000
std     6.454972   7.593857
min    25.000000  78.000000
25%    28.750000  83.250000
50%    32.500000  88.500000
75%    36.250000  92.750000
max    40.000000  95.000000

Average Scores by City:
city
London      92.0
New York    85.0
Paris       78.0
Tokyo       95.0
Name: score, dtype: float64

Pivot Table:
           age  score
city                 
London    30.0   92.0
New York  25.0   85.0
Paris     35.0   78.0
Tokyo     40.0   95.0

Transposed Pivot Table:
ci

#### 🏗️ Activity 11:
 
1. Add a boolean column `passes` set to `True` when `score` ≥ 80.  
2. Filter the DataFrame to show only Alice.

In [38]:
# Your code here ➡️

## 12. Vector Math with NumPy ➗

In [39]:

import numpy as np

# Create two simple vectors
vec1 = np.array([1, 2, 3])
vec2 = np.array([4, 5, 6])

# Basic vector operations
addition = vec1 + vec2
subtraction = vec2 - vec1
scalar_mult = vec1 * 2

print('Vector 1:', vec1)
print('Vector 2:', vec2)
print('Addition:', addition)
print('Subtraction:', subtraction)
print('Scalar multiplication:', scalar_mult)

Vector 1: [1 2 3]
Vector 2: [4 5 6]
Addition: [5 7 9]
Subtraction: [3 3 3]
Scalar multiplication: [2 4 6]


#### 🏗️ Activity 12:

Generate two random 300‑dimensional vectors and check their length (hint: `np.random.randn`).

In [40]:
# Your code here ➡️

## 13. Datetime Essentials 🕒

In [41]:
from datetime import datetime, timedelta

now = datetime.now()
one_week = timedelta(weeks=1)
print('Now:', now)
print('One week from now:', now + one_week)

# Parsing a date string
d = datetime.strptime('2025-05-27 13:00', '%Y-%m-%d %H:%M')
d

Now: 2025-05-28 17:55:48.207400
One week from now: 2025-06-04 17:55:48.207400


datetime.datetime(2025, 5, 27, 13, 0)

In [42]:
# Pandas date_range for time‑series
import pandas as pd

rng = pd.date_range('2025-01-01', periods=5, freq='D')
ts = pd.Series(range(5), index=rng)
ts

2025-01-01    0
2025-01-02    1
2025-01-03    2
2025-01-04    3
2025-01-05    4
Freq: D, dtype: int64

#### 🏗️ Activity 13:

1. find what date is today
2. find what date was 30 days ago

In [43]:
# Your code here ➡️