# Goal

1. Homework 2 (HW 2) Help: We will focus on Transfer Learning, especially for Problems 4 and 5.


2. Final Project Kick-off: We will help you start the project. The most important "first step" is building the "Basic Logic Bot". This bot is needed to create your training data.

# Homework2: Transfer Learning

In Homework 2, you have Problems 3, 4, and 5.

1. Problem 3: You build a CNN to classify CIFAR images. This is your "baseline" model.


2. Problem 4: You build a model to detect if a CIFAR image is rotated (e.g., by 90 degrees).


3. Problem 5: You use the trained model from Problem 4 to help your model for Problem 3

**Why do we do Problem 4 (the rotation task)?**

Tthe rotation task itself is not the main goal. The goal is to pre-train your model. By learning to spot rotation, your model learns basic "visual features" — like edges, textures, and shapes.

Then, in Problem 5, you "transfer" these learned features (the model weights) to your new classification task. This is Transfer Learning.



**You should see that your new model (from Problem 5):**

1. Trains faster (needs fewer steps to get a good result).

2. May get higher accuracy (it's not starting from zero).


**The homework also asks you to test two ideas: "freezing" vs. "fine-tuning".**


1. Freezing: You "lock" the transferred layers and only train the final, new classification layer.


2. Fine-Tuning: You "unlock" all layers and let them all train, but usually with a smaller learning rate.

In [None]:
# (Note: You must define your own model structure as required by the HW)

# Problem 4: Pre-training

# 1. Prepare the "rotation" dataset
def create_rotation_dataset(cifar_images):
    # Loop
        # Randomly choose 0 or 90 degrees
        # Rotate the image
        # Create a (rotated_image, label) pair.
        # For example: 0 = upright, 1 = rotated 90 deg
    # return (rotated_images, rotation_labels)
    pass

# 2. Build the "rotation detection" model
# Note: The last layer should be for binary classification (0 or 1)
rotation_model = YourModelArchitecture(output_units=2)

# 3. Train the model
# (Use your training loop)
rotation_model.train(rotated_images, rotation_labels)

# 4. Save the pre-trained weights
# We only care about the "feature" layers (like CNNs), not the final output layer.
rotation_model.save_feature_layer_weights('pretrained_weights.pth')


In [None]:
# Problem 5: Transfer Learning

# 1. Prepare the "classification" dataset
# (This is the original CIFAR dataset)
(cifar_images, cifar_labels) = load_cifar_data()

# 2. Build the "classification" model
# This model's feature layers MUST have the same architecture as the rotation_model
classification_model = YourModelArchitecture(output_units=10) # 10 classes

# 3. Load the pre-trained weights
classification_model.load_feature_layer_weights('pretrained_weights.pth')

# 4. Experiment A: Freeze layers
# Loop through all feature layers (but NOT the final output layer)
for layer in classification_model.feature_layers:
    layer.requires_grad = False # This "freezes" the layer

# 5. Train Experiment A
print("Training with FROZEN layers...")
# (Use your training loop)
classification_model.train(cifar_images, cifar_labels)
# (Record your loss and accuracy)

# 6. Experiment B: Fine-tuning
# You must reload the model or "unfreeze" the layers
for layer in classification_model.feature_layers:
    layer.requires_grad = True # This "unfreezes" the layer

print("Training with ALL layers (Fine-tuning)...")
# (Use your training loop, maybe with a smaller learning rate)
classification_model.train(cifar_images, cifar_labels)
# (Record your loss and accuracy)

# 7. Compare
# Compare Problem 3 (from scratch) vs. Experiment A (freeze) vs. Experiment B (fine-tuning)
# Look at:
# - Final test loss / accuracy
# - Training time / steps to get that result

# Final Project

It is November, and you should complete the first step:

1. Code the Minesweeper game environment.

2. Code the "basic logic bot".

Why is this Bot so important?


For Task 1 (Supervised Learning): You need data to train your model. This bot can play many games. You can use it to generate (board_state, mine_locations) data pairs.



For Task 2 (Actor-Critic): This bot is your "initial actor". Your Critic network can start by learning to predict this bot's performance.

This pseudocode follows the logic from the project PDF

In [None]:
class LogicBot:
    def __init__(self, game_environment):
        self.game = game_environment
        self.H, self.W = game.shape

        # Initialize sets [cite: 15]
        self.cells_remaining = set((r, c) for r in range(self.H) for c in range(self.W))
        self.inferred_safe = set()
        self.inferred_mine = set()

        # Store revealed clues: { (r, c) : clue_number } [cite: 16]
        self.clue_numbers = {}
        self.game_over = False

    def play_game(self):
        # Loop until game ends [cite: 17]
        while not self.game_over:

            # 1. Choose a cell to open [cite: 18]
            if self.inferred_safe:
                cell_to_open = self.inferred_safe.pop()
            else:
                # If no safe cells, pick a random cell from "remaining" [cite: 18]
                # (Make sure not to pick one we already think is a mine)
                available_cells = self.cells_remaining - self.inferred_mine
                if not available_cells:
                    break # No more cells to open
                cell_to_open = random.choice(list(available_cells))

            # 2. Open the cell [cite: 19]
            (r, c) = cell_to_open
            clue = self.game.open(r, c) # Assume game.open() returns -1 for a mine

            self.cells_remaining.discard((r, c))

            if clue == -1: # Hit a mine [cite: 19]
                self.game_over = True
                # (record results)
                break
            else:
                # 3. Update clues [cite: 20]
                self.clue_numbers[(r, c)] = clue

                # 4. Run the inference loop [cite: 24]
                self.run_inference_loop()

        # (Return game results)
        pass

    def run_inference_loop(self):
        # Keep looping until no new inferences are made [cite: 24]
        while True:
            new_inferences_made = False

            # For each cell with a revealed clue [cite: 21]
            for (r, c), clue_value in self.clue_numbers.items():

                # (You need a helper function get_neighbors(r, c))
                all_neighbors = self.get_neighbors(r, c)

                # Count neighbors
                unrevealed_neighbors = []
                num_inferred_mines_around = 0
                num_inferred_safe_around = 0
                num_revealed_safe_around = 0

                for nr, nc in all_neighbors:
                    if (nr, nc) in self.inferred_mine:
                        num_inferred_mines_around += 1
                    elif (nr, nc) in self.inferred_safe:
                        num_inferred_safe_around += 1
                    elif (nr, nc) in self.clue_numbers:
                        # This cell is already open, so it's safe
                        num_revealed_safe_around += 1
                    elif (nr, nc) in self.cells_remaining:
                        # This is an unknown, un-inferred neighbor
                        unrevealed_neighbors.append((nr, nc))

                num_unrevealed = len(unrevealed_neighbors)
                if num_unrevealed == 0:
                    continue

                # Core Logic 1: Mark Mines
                # If (clue_value) - (# known mines) == (# unrevealed neighbors)
                # Then all unrevealed neighbors MUST be mines.
                if (clue_value - num_inferred_mines_around) == num_unrevealed:
                    for (nr, nc) in unrevealed_neighbors:
                        if (nr, nc) not in self.inferred_mine:
                            self.inferred_mine.add((nr, nc))
                            self.cells_remaining.discard((nr, nc)) # Remove from "remaining" [cite: 22]
                            new_inferences_made = True


                # Core Logic 2: Mark Safe
                # (total # neighbors) - (clue_value) == total # of safe neighbors
                # (known safe neighbors) = (inferred safe) + (revealed safe)

                total_neighbors_count = len(all_neighbors)
                total_safe_neighbors_count = total_neighbors_count - clue_value
                known_safe_neighbors_count = num_inferred_safe_around + num_revealed_safe_around

                # If (total safe neighbors needed) - (known safe neighbors) == (# unrevealed neighbors)
                # Then all unrevealed neighbors MUST be safe.
                if (total_safe_neighbors_count - known_safe_neighbors_count) == num_unrevealed:
                    for (nr, nc) in unrevealed_neighbors:
                        if (nr, nc) not in self.inferred_safe:
                            self.inferred_safe.add((nr, nc))
                            # (Do NOT remove from cells_remaining.
                            #  They will be picked by the main loop.)
                            new_inferences_made = True

            # If this whole loop made no new inferences, we are done.
            if not new_inferences_made:
                break

    def get_neighbors(self, r, c):
        # (Helper function: returns valid (r, c) coords for all 8 neighbors)
        neighbors = []
        for dr in [-1, 0, 1]:
            for dc in [-1, 0, 1]:
                if dr == 0 and dc == 0:
                    continue
                nr, nc = r + dr, c + dc
                if 0 <= nr < self.H and 0 <= nc < self.W:
                    neighbors.append((nr, nc))
        return neighbors

This is just an idea, and you can also follow your own way, there is not the only one solution. If it's reasonable, it's right.

## Task 2

A naive training scheme:

1.   Use logic bot to play the game, obtaining datset `D_1`.
2.   train `Φ_1(s,a)` using `D_1`.
3.   Use `argmax(Φ_1(s,a))` to play the game, obtaining datset `D_2`.
4.   train `Φ_2(s,a)` using `D_2`.

...(repeat until the performance does not improve)


## Note

You might not even get bad results from very simple methods. But the other point to this project is to **demonstrate what you've learned** in the course.