# Transfer Learning

# Introduction

- Transfer Learning is a machine learning technique where a model developed for one task is reused as the starting point for a model on a second, related task.
- Instead of training a model from scratch, which requires a large amount of labeled data and computational resources, transfer learning allows us to take advantage of knowledge gained from a pre-trained model on a large dataset (like ImageNet) and fine-tune it for a new, specific task.

# Key Concepts in Transfer Learning:

* Pre-Trained Models:
 - These models are trained on a large dataset (e.g., ImageNet) for a generic task like image classification. The learned features (e.g., edges, shapes) are transferable to different but related tasks.
* Fine-Tuning:
 - Involves adjusting the pre-trained modelâ€™s weights on a new dataset, where the earlier layers (closer to the input) remain largely unchanged and the later layers are updated to adapt to the new task.
* Feature Extraction:
 -  A technique where only the final layers of the pre-trained model are replaced with new layers suited for the specific task, while the rest of the layers are frozen to avoid updating their weights.

# Why Use Transfer Learning?
* Faster Training:
 - Significantly reduces training time since a large portion of the model is already pre-trained.
* Less Data Requirement:
 - It is effective when you have a small dataset because the pre-trained model already has learned useful features.
* Better Performance:
 - Pre-trained models are often more generalizable, especially for complex tasks, because they are trained on vast and diverse datasets.

In [7]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms, models
import torchvision.transforms as T

# Data Processing

In [8]:
# Define data augmentation and normalization for training
transform_train = T.Compose([
    T.Resize(256),  # Resize images to 256x256
    T.RandomResizedCrop(227),  # Random crop to 224x224
    T.RandomHorizontalFlip(),  # Data Augmentation (Horizontal Flip)
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Test/Validation transformation
transform_test = T.Compose([
    T.Resize(256),
    T.CenterCrop(227),  # Center crop to 224x224
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load dataset (Assuming Cats vs Dogs is in 'data/cats_vs_dogs')
train_dataset = datasets.ImageFolder(root='cats_vs_dogs/train', transform=transform_train)
test_dataset = datasets.ImageFolder(root='cats_vs_dogs/test', transform=transform_test)

# Create DataLoader
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True, num_workers=4)
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False, num_workers=4)



# Transfer Learning

In [10]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = models.alexnet(pretrained=True)
model = model.to(device)

In [11]:
print(model)

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
 

In [12]:
for param in model.features.parameters():
    param.requires_grad = False

model.classifier[6] = nn.Linear(4096,2)
model = model.to(device)

In [13]:
# Define loss function and optimizer (using SGD with momentum)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)


# Training the model
def train_model(model, train_loader, criterion, optimizer, num_epochs=25):
    model.train()
    for epoch in range(num_epochs):
        running_loss = 0.0
        correct = 0
        total = 0

        for i, (inputs, labels) in enumerate(train_loader):
            inputs, labels = inputs.to(device), labels.to(device)

            # Zero the parameter gradients
            optimizer.zero_grad()

            # Forward pass
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step() # update weights

            # Calculate accuracy
            _, predicted = outputs.max(1) # return max value and index
            total += labels.size(0)
            correct += predicted.eq(labels).sum().item()  # .item() is used to convert the result of .sum() into a regular Python integer.

            # Print loss and accuracy for each batch
            running_loss += loss.item()
            # if i % 10 == 9:  # print every 100 mini-batches 99 so it does not print at 0 epoch
            #     print(f'Epoch [{epoch + 1}/{num_epochs}], Step [{i + 1}/{len(train_loader)}], '
            #           f'Loss: {running_loss / 100:.4f}, Accuracy: {100.0 * correct / total:.2f}%')
            #     running_loss = 0.0
        print(f'Epoch [{epoch + 1}/{num_epochs}], '
                      f'Loss: {running_loss / 100:.4f}, Accuracy: {100.0 * correct / total:.2f}%')


# Test the model
def test_model(model, test_loader):
    model.eval()  # Set the model to evaluation mode
    correct = 0
    total = 0

    with torch.no_grad():
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)

            outputs = model(inputs)
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += predicted.eq(labels).sum().item()

    print(f'Test Accuracy: {100.0 * correct / total:.2f}%')


In [14]:
# Train and test the model
train_model(model, train_loader, criterion, optimizer, num_epochs=2)



Epoch [1/2], Loss: 0.0147, Accuracy: 94.00%
Epoch [2/2], Loss: 0.0094, Accuracy: 98.88%


In [15]:
test_model(model, test_loader)

Test Accuracy: 98.50%
