# CNN Direction Classifier for Task 1

This notebook implements a Convolutional Neural Network (CNN) to train a line-following robot. The model analyzes images of lines and classifies them into three directions: forward, left, or right. This classification enables the robot to make real-time navigation decisions by processing camera input to follow lines on the ground, the images were generated according to the previous notebook using the dataset.


## Imports



In [None]:
# Essential imports for the CNN model
import os
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset
from torchvision import datasets
from sklearn.model_selection import train_test_split
from PIL import Image
import matplotlib.pyplot as plt

## Custom Dataset Class

To handle our image data efficiently, we'll create a custom PyTorch Dataset class. This class loads image paths and their corresponding labels (forward, left, right), applies transformations, and prepares them for training.



In [None]:
# Custom Dataset for loading images
class LineDirectionDataset(Dataset):
    def __init__(self, image_paths, labels, transform=None):
        self.image_paths = image_paths
        self.labels = labels
        self.transform = transform

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        image = Image.open(img_path)
        label = self.labels[idx]

        if self.transform:
            image = self.transform(image)

        return image, label

## Data Preprocessing & Augmentation

Before feeding images to our neural network, we need to preprocess them to ensure consistent input format. The transforms pipeline:

1. **Resizes** all images to 224×224 pixels (standard size for many CNN architectures)
2. **Converts** images to PyTorch tensors
3. **Normalizes** pixel values using ImageNet mean and standard deviation

This preprocessing standardizes the input data, which helps the network learn more effectively.

Next, we'll load the image dataset from the 'images' directory, which contains three subdirectories (forward, left, right) with images of each line direction captured by the robot's camera.

In [None]:
# Set up image transformation pipeline
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load images from directories
image_dir = 'images'
categories = ['forward', 'left', 'right']

image_paths = []
labels = []

for idx, category in enumerate(categories):
    category_path = os.path.join(image_dir, category)
    for filename in os.listdir(category_path):
        if filename.endswith(".jpg") or filename.endswith(".png"):
            image_paths.append(os.path.join(category_path, filename))
            labels.append(idx)  # 0=forward, 1=left, 2=right

## Train-Test Split

To properly evaluate our model, we need to separate our data into training and testing sets. We'll use 67% of the data for training and 33% for testing.

The training set is used to teach the model, while the testing set allows us to evaluate how well it generalizes to new, unseen data. This validation is crucial to ensure the robot will make reliable decisions when deployed in the real world with new line patterns.

In [None]:
# Split data into training and testing sets
train_paths, test_paths, train_labels, test_labels = train_test_split(
    image_paths, labels, test_size=0.33, random_state=42
)

# Create dataset objects
train_dataset = LineDirectionDataset(train_paths, train_labels, transform=transform)
test_dataset = LineDirectionDataset(test_paths, test_labels, transform=transform)

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

## CNN Model Definition

Now we define the neural network architecture for our line detection model. We're using a simple Convolutional Neural Network (CNN) with:

1. A convolutional layer that extracts 64 feature maps from the input image
2. A pooling layer to reduce spatial dimensions and computational requirements
3. Two fully-connected layers that process these features and output directional predictions

This simple architecture is sufficient for this task and can run efficiently on the robot's hardware. For a production robot, you might consider a more optimized architecture like MobileNet that balances accuracy and computational efficiency.

In [None]:
# Define a simple CNN model for direction classification
class LineDirectionCNN(nn.Module):
    def __init__(self):
        super(LineDirectionCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(64 * 112 * 112, 128)
        self.fc2 = nn.Linear(128, 3)  # 3 output classes (forward, left, right)

    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))
        x = x.view(-1, 64 * 112 * 112)  # Flatten the image tensor
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

## Model Initialization and Setup

With our architecture defined, we'll initialize the model and set up the training environment:

1. Create an instance of our CNN model
2. Set up GPU acceleration if available (speeds up training significantly)
3. Define our loss function (Cross-Entropy Loss is standard for classification)
4. Configure the Adam optimizer with a learning rate of 0.0015

These configurations provide a good balance between training speed and model accuracy for our robot's direction classifier.

In [None]:
# Initialize model, loss function, and optimizer
model = LineDirectionCNN()

# Set device to GPU if available, otherwise CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Move model to the selected device
model = model.to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.0015)

## Training Loop

Now we'll train our CNN model on the dataset of line images. During training, the model:

1. Processes batches of 64 images at a time
2. Makes predictions about the direction (forward, left, or right)
3. Compares these predictions to the true labels
4. Updates its internal parameters to minimize the prediction error

We'll train for 15 epochs (complete passes through the dataset) and track both loss and accuracy to monitor the learning progress. As training proceeds, we should see the loss decrease and the accuracy increase, indicating that the robot is learning to classify directions correctly.

In [None]:
# Training the model
num_epochs = 15

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct_predictions = 0
    total_predictions = 0

    for inputs, labels in train_loader:
        # Move inputs and labels to the selected device (GPU or CPU)
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()

        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        _, predicted = torch.max(outputs, 1)
        correct_predictions += (predicted == labels).sum().item()
        total_predictions += labels.size(0)

    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}, Accuracy: {correct_predictions/total_predictions:.4f}')

## Model Evaluation

After training, we need to evaluate our model's performance on the test set - images the model hasn't seen during training. This evaluation provides a realistic estimate of how well our robot will perform in real-world line-following scenarios.

A high test accuracy indicates that the robot should be able to reliably detect line directions when deployed. We'd typically aim for at least 90% accuracy for reliable robot navigation.

In [None]:
# Evaluating the model on test data
model.eval()
correct_predictions = 0
total_predictions = 0
with torch.no_grad():
    for inputs, labels in test_loader:
        # Move inputs and labels to the selected device (GPU or CPU)
        inputs, labels = inputs.to(device), labels.to(device)

        outputs = model(inputs)
        _, predicted = torch.max(outputs, 1)
        correct_predictions += (predicted == labels).sum().item()
        total_predictions += labels.size(0)

print(f'Test Accuracy: {correct_predictions/total_predictions:.4f}')

## Visualization and Prediction Function

To understand how our robot "sees" and interprets line directions, we'll create a visualization function. This function:

1. Takes a sample image from our test set
2. Processes it through our trained CNN
3. Makes a direction prediction
4. Displays the image with the predicted direction

This visualization helps us verify that the model is correctly interpreting visual cues in the images and would steer the robot appropriately when deployed.

In [None]:
# Function to visualize an image and make a prediction
def visualize_and_predict(model, image_path, transform):
    # Load the image and apply transformation
    image = Image.open(image_path)
    image_tensor = transform(image).unsqueeze(0)

    # Move image tensor to the same device as the model
    image_tensor = image_tensor.to(device)

    model.eval()
    
    with torch.no_grad():
        output = model(image_tensor)
        _, predicted_class = torch.max(output, 1)
    
    classes = ['forward', 'left', 'right']
    predicted_label = classes[predicted_class.item()]

    plt.imshow(image)
    plt.title(f"Predicted: {predicted_label}")
    plt.axis("off")
    plt.show()

## Save Model and Test Prediction

Finally, we'll save our trained model so it can be deployed on the robot's hardware. The model is saved in PyTorch's standard format (.pth file), which can be loaded onto the robot's onboard computer.

To verify everything works correctly, we'll test the model on a sample image and visualize the prediction. This simulates what would happen when the robot processes an image from its camera during line following.

Once deployed, the robot would:
1. Capture an image from its camera
2. Process it through this CNN model
3. Get a direction prediction (forward, left, or right)
4. Adjust its motors accordingly to follow the line

In [None]:
# Save the model
model_save_path = "follower.pth"
torch.save(model.state_dict(), model_save_path)

# Test the prediction function with a sample image
test_image_path = "images/right/right_318.png"  # Replace with a valid image path
visualize_and_predict(model, test_image_path, transform)