<a href="https://colab.research.google.com/github/turna1/CISC801-Topics-in-AI/blob/main/W1_Lab1_CISC801.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Example 1 Simple Classifier

To get our hands dirty with a little bit of code, let's write a simple classifier using the traditional, rule-based approach. This will show us some basic Python syntax and highlight why this method isn't powerful enough for complex problems. Our function will classify a number as "small," "medium," or "large."

In [1]:
# We define a function using the 'def' keyword.
# This function takes one input, 'number'.
def classify_number(number):
    """
    This function takes a number and returns a string label:
    "small", "medium", or "large" based on hand-written rules.
    """
    print(f"Analyzing the number: {number}")

    # The 'if' statement checks a condition.
    # If the number is less than 10, the indented code below runs.
    if number < 10:
        return "small"

    # 'elif' is short for "else if". It checks another condition.
    elif number < 100:
        return "medium"

    # 'else' runs if none of the above conditions were true.
    else:
        return "large"




Analyzing the number: 5
The result is: small
Analyzing the number: 50
The result is: medium
Analyzing the number: 500
The result is: large


In [2]:
# Now, let's test our function.
result1 = classify_number(5)
print(f"The result is: {result1}")

result2 = classify_number(50)
print(f"The result is: {result2}")

result3 = classify_number(500)
print(f"The result is: {result3}")

Analyzing the number: 5
The result is: small
Analyzing the number: 50
The result is: medium
Analyzing the number: 500
The result is: large


This works perfectly for our simple problem because the rules are clear and unchanging. But imagine trying to write ‘if’ statements for the millions of pixel values in a picture of a cat—it would be impossible! This is why we need a model that can learn the rules on its own.

# Example 2: Finding Patterns in a Simple List


Let's write a simple Python script that mimics the goal of unsupervised learning. We won't use an ML library, but we'll write code that finds an inherent pattern in a list of words—in this case, we'll group them by their starting letter.

In [3]:
# Here is our unlabeled list of words.
words = ["apple", "banana", "ant", "boat", "car", "cat", "anchor"]

# We will store our groups in a dictionary.
# A dictionary stores key-value pairs, like {"a": ["apple", "ant"]}.
grouped_words = {}

# A 'for' loop lets us check every word in our list.
for word in words:
    # Get the first letter of the current word.
    first_letter = word[0]

    # Check if we have already started a group for this letter.
    if first_letter not in grouped_words:
        # If not, create a new empty list for this letter.
        grouped_words[first_letter] = []

    # Add the current word to the group for its first letter.
    grouped_words[first_letter].append(word)



In [4]:
# Print the organized groups.
print("Found the following groups:")
print(grouped_words)


Found the following groups:
{'a': ['apple', 'ant', 'anchor'], 'b': ['banana', 'boat'], 'c': ['car', 'cat']}


# Example 3:

## Example 7: Reinforcement Learning - Simplified Cart-Pole Balancing

The Cart-Pole problem is a classic Reinforcement Learning task. The goal is to balance a pole on a cart by moving the cart left or right. The agent receives a reward for every timestep the pole remains upright. If the pole falls beyond a certain angle or the cart moves too far off-center, the episode ends.

For simplicity, we'll create a highly abstract version focusing on the core RL loop rather than a full physics simulation.

## Example 3: Reinforcement Learning - Simplified Cart-Pole Balancing

The Cart-Pole problem is a classic Reinforcement Learning task. The goal is to balance a pole on a cart by moving the cart left or right. The agent receives a reward for every timestep the pole remains upright. If the pole falls beyond a certain angle or the cart moves too far off-center, the episode ends.

For simplicity, we'll create a highly abstract version focusing on the core RL loop rather than a full physics simulation.

In [7]:
import numpy as np
import random

# --- Simplified Cart-Pole Environment ---
# State: [pole_angle, cart_position]
# Actions: 0 (move left), 1 (move right)

def initialize_environment():
    # Pole starts slightly off-center, cart at center
    pole_angle = random.uniform(-0.05, 0.05) # radians, small random tilt
    cart_position = 0.0 # meters
    return [pole_angle, cart_position]

def step(state, action):
    pole_angle, cart_position = state

    # Simulate simple physics (highly simplified)
    # Moving cart impacts pole angle and position
    if action == 0: # Move left
        cart_position -= 0.1
        pole_angle += 0.03 # Pole might tilt more if moving fast
    else: # Move right
        cart_position += 0.1
        pole_angle -= 0.03

    # Add some random disturbance to pole angle
    pole_angle += random.uniform(-0.02, 0.02)

    # Define termination conditions
    done = False
    reward = 1.0 # Reward for keeping the pole balanced

    # Pole falls if angle is too extreme
    if abs(pole_angle) > 0.2: # ~11 degrees
        reward = 0.0 # No reward if pole falls
        done = True

    # Cart goes off track
    if abs(cart_position) > 2.0: # meters
        reward = 0.0
        done = True

    next_state = [pole_angle, cart_position]
    return next_state, reward, done

print("Simplified Cart-Pole environment setup.")

Simplified Cart-Pole environment setup.


### The Agent (Q-Learning with State Discretization)

Since the state space (pole angle and cart position) is continuous, we need to **discretize** it into bins so we can build a Q-table. Our Q-learning agent will then learn the best action for each discretized state.

*   **State discretization**: Divide the continuous ranges of pole angle and cart position into a fixed number of bins.
*   **Q-table**: Stores `Q[angle_bin, position_bin, action]` values.
*   **Epsilon-Greedy**: For balancing exploration and exploitation.
*   **Learning Rate (alpha)** and **Discount Factor (gamma)**: As before.

In [9]:
# --- Q-Learning Agent Parameters ---
# Discretization for pole_angle and cart_position
angle_bins = np.linspace(-0.2, 0.2, 10) # 10 bins for pole angle
position_bins = np.linspace(-2.0, 2.0, 10) # 10 bins for cart position

def discretize_state(state):
    angle, position = state
    angle_idx = np.digitize(angle, angle_bins) - 1
    position_idx = np.digitize(position, position_bins) - 1

    # Handle out-of-bounds due to digitize behavior, clamp to valid indices
    angle_idx = np.clip(angle_idx, 0, len(angle_bins) - 1)
    position_idx = np.clip(position_idx, 0, len(position_bins) - 1)

    return int(angle_idx), int(position_idx)

# Q-table dimensions: (num_angle_bins, num_position_bins, num_actions)
q_table = np.zeros((len(angle_bins), len(position_bins), 2))

# Hyperparameters
epsilon = 0.2       # Exploration rate
learning_rate = 0.5 # Alpha
discount_factor = 0.9 # Gamma
num_episodes = 5000  # Number of training episodes
max_steps_per_episode = 200 # Max steps before an episode is forced to end

print("Starting Q-learning training for Cart-Pole...")

for episode in range(num_episodes):
    current_state = initialize_environment()
    total_episode_reward = 0
    done = False
    steps = 0

    while not done and steps < max_steps_per_episode:
        angle_idx, position_idx = discretize_state(current_state)

        # Epsilon-greedy action selection
        if random.uniform(0, 1) < epsilon: # Explore
            action = random.randrange(2) # 0 or 1
        else: # Exploit
            action = np.argmax(q_table[angle_idx, position_idx, :])

        next_state, reward, done = step(current_state, action)
        next_angle_idx, next_position_idx = discretize_state(next_state)

        # Q-learning update rule
        old_q_value = q_table[angle_idx, position_idx, action]
        next_max_q = np.max(q_table[next_angle_idx, next_position_idx, :])

        new_q_value = old_q_value + learning_rate * (reward + discount_factor * next_max_q - old_q_value)
        q_table[angle_idx, position_idx, action] = new_q_value

        current_state = next_state
        total_episode_reward += reward
        steps += 1

    # Decay epsilon to reduce exploration over time
    epsilon = max(0.01, epsilon * 0.995)

    if (episode + 1) % 500 == 0:
        print(f"Episode {episode + 1}, Total Reward: {total_episode_reward}, Epsilon: {epsilon:.2f}")

print("\n--- Cart-Pole Q-Learning Training Complete ---")

# --- Test the trained agent (simulate one episode) ---
print("\nTesting the trained agent for one episode...")
current_state = initialize_environment()
total_test_reward = 0
done = False
test_steps = 0

while not done and test_steps < max_steps_per_episode * 2: # Allow more steps for testing
    angle_idx, position_idx = discretize_state(current_state)
    action = np.argmax(q_table[angle_idx, position_idx, :]) # Always exploit in test
    next_state, reward, done = step(current_state, action)
    total_test_reward += reward
    current_state = next_state
    test_steps += 1

print(f"Test Episode finished in {test_steps} steps with total reward: {total_test_reward}")
if total_test_reward > 0:
    print("The agent successfully kept the pole balanced for some time!")
else:
    print("The agent struggled to balance the pole.")

Starting Q-learning training for Cart-Pole...
Episode 500, Total Reward: 200.0, Epsilon: 0.02
Episode 1000, Total Reward: 200.0, Epsilon: 0.01
Episode 1500, Total Reward: 200.0, Epsilon: 0.01
Episode 2000, Total Reward: 200.0, Epsilon: 0.01
Episode 2500, Total Reward: 200.0, Epsilon: 0.01
Episode 3000, Total Reward: 200.0, Epsilon: 0.01
Episode 3500, Total Reward: 200.0, Epsilon: 0.01
Episode 4000, Total Reward: 200.0, Epsilon: 0.01
Episode 4500, Total Reward: 200.0, Epsilon: 0.01
Episode 5000, Total Reward: 200.0, Epsilon: 0.01

--- Cart-Pole Q-Learning Training Complete ---

Testing the trained agent for one episode...
Test Episode finished in 400 steps with total reward: 400.0
The agent successfully kept the pole balanced for some time!


# Example 4: Representing Data in Python

The code looked at the raw, unlabeled data and automatically found a way to structure it into groups. Real clustering algorithms use more complex math to do this, but the core idea of finding the underlying structure is the same.

Representing Data in Python
Before we can feed data to a model, we need to represent it in our code. A common way to start is with a list of lists, where each inner list represents a single data point (like one student).


In [5]:
# This dataset represents three students.
# The features are: [Hours Studied, Hours of Sleep]
# The label is the last element: "pass" or "fail"

student_data = [
    [8, 7, "pass"],  # Student 1: 8 hrs study, 7 hrs sleep, passed.
    [4, 5, "fail"],  # Student 2: 4 hrs study, 5 hrs sleep, failed.
    [7, 8, "pass"]   # Student 3: 7 hrs study, 8 hrs sleep, passed.
]

# We can access the data for the first student (at index 0).
first_student = student_data[0]

# Now we can separate the features from the label for that student.
# In Python, slicing with [:-1] means "get everything except the last element".
student_features = first_student[:-1]

# Slicing with [-1] means "get the very last element".
student_label = first_student[-1]



In [6]:
print(f"Data for first student: {first_student}")
print(f"Features: {student_features}")
print(f"Label: {student_label}")


Data for first student: [8, 7, 'pass']
Features: [8, 7]
Label: pass
