# Homework 1

1.[2pts]Consider a sequence of values $\{x_1, x_2,\dots, x_n\}$ of some variable x, and suppose
we compute an exponentially weighted moving average using the formula
$\mu_n =\beta \mu_{n-1}+(1-\beta)x_n $
where $0<\beta<1$. By making use of the following result for the sum of a finite
geometric series
$$\sum_k=1^n \beta_{k-1} = \frac{1-\beta^n}{1-\beta},$$
show that if the sequence of averages is initialized using µ0 = 0, then the estimators
are biased and that the bias can be corrected using
$$\hat \mu_n=\frac{\mu_n}{1-\beta^n}$$

2. [3pts]Prove the shift invariant property of convolution: 
$$\mathcal{G}(f(\cdot-a))(x)=\mathcal{G}(f(\cdot))(x-a),$$
where $\mathcal{G}$ is the convolutional operator $\mathcal{G}(f(\cdot))=\int_{-\infty}^{+\infty} f(\tau)h(x-\tau)d\tau$

3. Neural Network Architectures 
(1)[3pts] Derive the number of trainable parameters in a single convolutional layer with input size $H \times W \times C_{\mathrm{in}}$, kernel size $k \times k$, and $C_{\mathrm{out}}$ output channels (assume bias), and compare it with a fully connected (dense) layer of the same input and output size. 
(2)[2pts]Explain why positional information is necessary in Transformers and describe one method to inject positional information.

4.[15pts]Training a Convolutional Neural Network (CNN) on EuroSAT for Image Classification

In this assignment, you will train a deep learning model from scratch for EuroSAT dataset classification. EuroSAT is a dataset of 27,000 RGB satellite images (64×64 pixels) across 10 land cover classes, derived from Sentinel-2 satellite data for remote sensing classification tasks.

You are required to complete the following code by **filling in your own architecture and training function.** In the sections that specify **“To be implemented by students”**, you should replace pass with your own implementation.

After completing the implementation, answer the following questions and submit a report in Markdown/PDF format.
- Estimate the number of parameters and the feature map sizes at each layer.
- Report training accuracy and loss over epoches and the testing accuracy on test data.
- Compare training and test error with and without Batch Normalization and Dropout layers.
- Please also submit the Jupyter Notebook (.ipynb) with your complete, executable code.(You can just edit on this notebook file.)


### Notes:
- **Google Colab or AutoDL is recommended for training if you don’t have a local GPU.**
- **Submission Deadline: November 2, 11:59 PM**


### 1.Setup: Load Dataset & Preprocessing

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader, random_split
import time

# Ensure GPU usage
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Data transformations
transform = transforms.Compose([
    transforms.Resize((64, 64)),  # Resize images
    transforms.ToTensor(),  
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Standardization
])

# Load EuroSAT dataset
dataset = datasets.EuroSAT(root="./data", transform=transform, download=True)

# Split dataset into 80% training and 20% testing
train_size = int(0.8 * len(dataset))
test_size = len(dataset) - train_size
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])

# Data loaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4, pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False, num_workers=4, pin_memory=True)

### 2. Define the Neural Network (**To be implemented by students**)

In [None]:
class MyCNN(nn.Module):
    def __init__(self):
        super(MyCNN, self).__init__()
        # TODO: Define your CNN model architecture
        # - Experiment with different layers, number of filters, kernel sizes
        # - Try using BatchNorm, Dropout, and deeper architectures
        pass

    def forward(self, x):
        # TODO: Implement forward pass
        pass

### 3. Define the Training Function (**To be implemented by students**)

In [None]:
def train_model(model, train_loader, criterion, optimizer, device, epochs=10):
    """
    Train the model and measure performance.
    - Record training time per epoch
    - Report training loss and accuracy
    - Measure training time per model architecture
    """
    # TODO: Implement the training loop
    pass

### 4. Model Training

In [None]:
# Instantiate the model and move to device
model = MyCNN().to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
train_model(model, train_loader, criterion, optimizer, device, epochs=10)

### 5. Model Evaluation

In [None]:
def evaluate_model(model, test_loader, device):
    model.eval()
    correct = 0
    total = 0
    test_loss = 0.0
    criterion = nn.CrossEntropyLoss()

    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)

            outputs = model(images)
            loss = criterion(outputs, labels)

            test_loss += loss.item() * images.size(0)
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    test_loss = test_loss / total
    test_acc = correct / total * 100
    print(f"Test Loss: {test_loss:.4f} | Test Acc: {test_acc:.2f}%")

# Evaluate the trained model
evaluate_model(model, test_loader, device)