# Mini Project: Ultimate Tic-Tac-Toe Game

**Release Date:** 8 March 2024

**Due Date:** 23:59, 12 April 2024

## Overview

Ultimate Tic-Tac-Toe is a complex variant of the classic Tic-Tac-Toe game that adds strategic depth and computational complexity. The game consists of nine regular Tic-Tac-Toe boards arranged in a 3×3 grid, creating a "meta-board". Winning requires not just tactical play on individual boards, but strategic thinking about the overall game state.

In terms of our terminology for the agent environment, Ultimate Tic-Tac-Toe is a fully observable, strategic, deterministic game. Your task is to create an AI agent that can effectively play this game through various approaches including minimax search with alpha-beta pruning, and potentially incorporating machine learning for board evaluation.

How exactly do you code an AI to play this game? Like everything else in this course, we code an agent. An agent takes sensory input and reasons about it, then outputs an action at each time step. You need to create a program that can read in a representation of the game state (that’s the input) and output a legal move in Ultimate Tic-Tac-Toe. You’ll need to develop an evaluation function to assess how good a board state is for your agent. The better your evaluation function, the better your agent will be at picking good moves. Aside from the evaluation function, you also need to decide on a strategy for exploring the search space. You can use minimax search with alpha-beta pruning or its variants, and you may want to explore efficient approaches for state evaluation.


### Changelog

**10 March 2025**

- Fixed issues with the dataset and dataset loader.
  - The dataset has been renamed to `data.pkl` to distinguish it from the previous version and avoid any confusion.
- Simplified the `State` class:
  - It includes all necessary information for decision-making, such as the previous action on the local board. This eliminates the need to track this information separately.
  - Local board status is now automatically updated, removing the need to manually call the `update_local_board_status` function.
  - If you've previously worked on the mini-project using an earlier version, you can continue with minimal changes, as the new update is mostly backward compatible. Exceptions:
    - In-place state changes are no longer supported.
    - Providing the previous action for certain functions is no longer required (though it will still work if provided).
- A new functional API has been introduced for users who prefer working with immutable states.

**8 March 2025**

- Initial release.


### Required Files

* `utils.py`
* `data.pkl`
* `figures/`
* `mini-project.ipynb`

### Policy

Please refer to our [Course Policies](https://canvas.nus.edu.sg/courses/62323/pages/course-policies)

Any form of cheating—such as plagiarism, hacking Coursemology test cases, or any other dishonest conduct—will be treated as an academic offense. The mini project constitutes 10% of the final grade; therefore, any academic misconduct will be classified as a **Moderate Offense**, with the maximum penalty being an <strong style="color:red">automatic 'F' grade</strong> for the module. Submission history will be closely monitored.

### Post-Mini Project Survey

Your feedback is important to us! After completing this mini project, please take a moment to share your thoughts by filling out this [survey](https://Coursemology.org/courses/2951/surveys/2621).

### Forum

If you have any questions, please visit our [forum](https://coursemology.org/courses/2951/forums/mini-project) for clarification.

## Task and Submission

Your task is to develop an agent capable of playing Ultimate Tic-Tac-Toe well.

To complete the task, you need to implement `StudentAgent` class (see [AI Agent](#AI-Agent)) with your agent’s logic.

### Submission Details:

#### Agent

Please submit your `StudentAgent` code to Coursemology.

- **Tasks:** There are **10 tasks** in Coursemology, each corresponding to a different agent.
- **Agent Families:** There are **4 families of agents**: 
  - **A, B1-3, C1-3, D1-3** (increasing levels of sophistication and difficulty)
  - The **letter** indicates the **sophistication** of the techniques used, with **A** being the simplest and **D** being the most advanced. The **number** represents the **difficulty level** within each family, with higher numbers (e.g., B3, C3, D3) indicating more challenging agents.
  - All agents use techniques taught (or will be taught) in class.
- **Test Cases:** Each task contains **2 test cases**, one where your agent plays as the first player (mover) and another where your agent plays as the second player (mover).

You are allowed to reuse the same agent code across all 10 tasks or modify your agent for different tasks as you see fit.

#### Supplementary Files

You are required to upload all files and supplementary codes utilized in the creation of your agent. If you are using machine learning, this includes, but is not limited to, code for generating datasets (if applicable), machine learning algorithms, and the datasets generated. 

These materials will be used to support plagiarism detection and ensure the originality of the work submitted.

### Attempt Limit:

- You have **20 attempts** for each individual task.

### Task Constraints

1. **Minimax Only:** You are only allowed to use Minimax or variations of it. Algorithms like Monte Carlo Tree Search (MCTS) are **not permitted**.
2. **No State Representation Modifications:** You are not allowed to modify the state representation of the board for optimization (e.g., using bitboards).

**Reasoning Behind the Constraints:**

- One of the key focuses of the mini project is to have you design a **strong evaluation function**. MCTS does not require an evaluation function, which is why it is not allowed.
- The goal is to test your understanding of **AI concepts** and **game-playing techniques**, not optimization of data structures.

<p style="color: red">Failure to comply with these constraints will result in <b>zero marks</b> for this mini project</p>

### Coursemology

- The Coursemology server has a maximum memory limit of 1 GB.
- The server does not have a GPU, so your model should not rely on GPU resources. Additionally, assume your model will only have access to a single CPU, and avoid including multi-threaded or multi-processor code in your submission.
- You can test the runtime of any program to estimate Coursemology's processing speed and check the availability of Python libraries by using https://coursemology.org/courses/2951/assessments/77038

### Grading Scheme

Your score will be based on your agent's performance against our agents in Coursemology. For each task, your score is determined by the outcome of the game:

- **Win:** +1 point
- **Draw:** +0.5 points
- **Loss:** 0 points

The maximum possible score across all tasks is **20 points**. Your final score will then be **normalized to a maximum of 10 points**, representing 10% of your final grade.

## Ultimate Tic Tac Toe Game Description

You can find a live demo of the game explaining the game play in this youtube link: [https://www.youtube.com/watch?v=1hcTw3YDWr0](https://www.youtube.com/watch?v=1hcTw3YDWr0). Ultimate Tic Tac Toe is an advanced version of classic Tic Tac Toe, played on a board composed of nine smaller Tic Tac Toe boards arranged in a 3×3 layout. The objective is to win three of these smaller boards
(local-boards) in a row—horizontally, vertically, or diagonally—thereby winning the overall game.

### Game Structure

<p align="center" id="figure1"><img width="500" alt="" src="./figures/uttt-basics.jpg" /></p>

<p align="center">Figure 1: Ultimate Tic-Tac-Toe Game Board</p>

The game consists of two levels:

* Small Boards (Local-Boards): Each of these is a standard Tic Tac Toe board. For clarity, Figure 1 shows the internal layout of a meta-board with each cell labeled by its coordinate.
* Large Board (Meta-Board): The nine local-boards are arranged in a 3×3 grid. Winning a local-board places your mark (or claim) on the corresponding cell of the large meta-board.

### Move Mechanics and the "Send" Rule

The first player can place anywhere on the board for the first move. However, the twist is that the position of your move dictates where your opponent must play next. This mechanism, known as the “send” rule, is illustrated in Figure 2: if player X makes a move to the bottom right corner of any local-board, then O must make the next move in the bottom right local-board. Note that the very first move in the empty meta-board can be made anywhere. This “Send” rule will be applied to all consecutive moves in the game play. The board that the current player is sent to is called "active" board.

<p align="center" id="figure2"><img width="500" alt="" src="./figures/ultimate_move.png" /></p>

<p align="center">Figure 2: Illustration of the move mechanics where the chosen cell sends the opponent to the corresponding meta-board</p>

### Forced Moves and Open Choice

There are cases when the local-board a player is sent to is no longer active—either because it has already been won or it is completely filled (resulting in a draw). In such scenarios, the rules grant the player the freedom to choose any active local-board for their move.

<p align="center" id="figure3"><img width="500" alt="Any move" src="./figures/any_move.png" /></p>

<p align="center">Figure 3: When the target local-board is already won or full, the player may choose any local-board. As in the example, x player can now play at any local board.</p>

### Winning Conditions

* Local Board Victory: A local-board is won by achieving three in a row (horizontally, vertically, or diagonally) within that board.
* Overall Game Victory: The overall game is won by claiming three local-boards in a row on the large meta-board.
* Draw: The game is a draw when the current player cannot make any move, and no player has won. This happens when all local-boards have been claimed or have ended in a draw.

<p align="center" id="figure4"><img width="500" alt="Winning example" src="./figures/main_victory.png" /></p>

<p align="center">Figure 4: Example of an overall winning configuration: three local-boards in a row on the large meta-board.</p>

## Ultimate Tic Tac Toe Technical Description

This section details the technical aspects needed to implement an AI agent for Ultimate Tic Tac Toe. It covers the game’s state representation and the implementation framework, including core functionalities and utility functions that manage state updates, move validation, and game termination.

In [None]:
# Run the following cell to import utilities

import numpy as np
import time

from utils import State, Action, load_data

### State Representation

The game state is captured using two complementary structures:
* **Global 4D Array (3×3×3×3)**:
  * The first two dimensions index the local-boards on the meta-board.
  * The last two dimensions index the cells within each local-board (see Figure 1).
  * Cell values: 0 (empty), 1 (Player 1 filling cells with 1; can think of this as X), 2 (Player 2 filling cells with 2; can think of this as O).
  * Note that as a student, you will always make your move as player 1 filling the board with 1’s; no need to worry about being player 2.
* **Local Board Status Matrix (3×3)**: Each element indicates the state of a local-board (total 3×3=9 local boards):
  * 0: Ongoing game in that local board
  * 1: Victory for player 1 in that local board
  * 2: Victory for player 2 in that local board
  * 3: Draw

### Action

An action is represented by a tuple of 4 elements $(i, j, k, l)$, where $i, j$ are the indices of the local board that the move is made on, and $k, l$ are the indices of the mark made on the local board, $0 \leq i, j, k, l \leq 2$.

For example, the move in figure 2

<p align="center" id="figure2"><img width="500" alt="" src="./figures/ultimate_move.png" /></p>

is represented by the tuple `(2, 0, 2, 2)`. The row and column indices of the local board is $0$ and $2$ respectively. Within the local board, the row and column indices of the mark is $2$ and $2$ respectively.

### Game Logic

The core game logic is encapsulated in a `State` class that manages state initialization, move validation, state updates, and game termination. The primary functionalities include:

#### Initialization

* `board`: A 4D array (3×3×3×3) initialized to all zeros. Leave this blank to instantiate the starting board.
* `fill_num`: The current (active) player (1 or 2). Defaults to 1.
* `prev_local_action`: The tuple of 2 element representing the position of the last move within the last local board. This corresponds to the last 2 elements of the `prev_action` tuple.

#### Fields

* `board`: The 4D array (3×3×3×3) representing the board.
* `fill_num`: The current (active) player (1 or 2).
* `local_board_status`: 3×3 Matrix tracking the status of each local-board.
* `prev_local_action`: Tuple of 2 elements representing the position of the last action within the last local board. This corresponds to the last 2 elements of the last action tuple.

#### Methods

These object methods provide the essential operations required for gameplay:
* `is_valid_action(action)`: Checks if the move is valid (depends on the previous action in this game as well).
* `get_all_valid_actions()`: Returns a list of all legal moves given the previous action.
* `get_random_valid_action()`: Provides a random valid move.
* `change_state(action)`: Applies a move to update the state by toggling the active player (`fill_num`) and updating the local board statuses.
* `is_terminal()`: Checks whether the game has ended by evaluating the global board status.
* `terminal_utility()`: Returns the outcome value of a terminal state:
  * 1.0 for a win by Player 1,
  * 0.0 for a win by Player 2,
  * 0.5 for a draw.
* `clone()`: Creates a deep copy of the current state.

### Functional API

The `State` class serves as a wrapper for the `ImmutableState` class and the functions that operate on it. If you prefer to work directly with the `ImmutableState`—which, as the name suggests, is immutable—you can do so. Given a `State` instance, `state_mutable`, you can access the underlying immutable state by referencing `state_mutable._state`. 

The functions that operate on `ImmutableState` have the same signature as the object methods in the `State` class. To use them, simply pass the `ImmutableState` instance as the first argument, followed by any other required arguments. For instance, to get the next (immutable) state given an immutable state instance `state` and action `action`, you can call `change_state(state, action)`.

### Required Files

You will need to keep the **`utils.py`** file in the same directory as this notebook to test your
agent code. **`utils.py`** contains core utility functions mentioned above which you can use
directly.

## Dataset

This dataset contains a series of board-action pairs along with their corresponding evaluation values. These pairs are generated through random gameplay, and the evaluation for each pair is provided by an expert agent. You can leverage this dataset to improve the evaluation function for your own agent.

Each entry in the `data.pkl` file is a tuple consisting of two elements:

1. The first element is an instance of the `State` class, representing the current state of the board.
2. The second element is a numeric evaluation representing how favorable the current board is for Player 1. Higher values indicate a greater likelihood of Player 1 winning, while lower values suggest a higher likelihood of Player 1 losing. The value ranges from -1 (loss) to +1 (win).

You can easily load the dataset by calling the `load_data` function.

Note: Please ensure that the `data.pkl` file is placed in the same directory as this notebook.

For guidance on building a machine learning model, please refer to the [Resources & Hints](#resources--hints) section below.

In [None]:
data = load_data()
assert len(data) == 80000
for state, value in data[:3]:
    print(state)
    print(f"Value = {value}\n\n")

## AI Agent

Fill in the `choose_action(state)` method and `__init__()` (optional) of the `StudentAgent` class (see code snippets below) with your game playing agent code. You can write as many assisting functions as needed. The game framework will handle alternating turns between players.

#### `choose_action(state)`

This function will be called by the programme for your agent to return a move.
* The `state` is an instance of `State` representing the current board that you need to move on.

#### `__init__()`

This function will be called to instantiate your agent without any argument. You may decide to leave this function blank.

Optionally, these are some actions you can do in `__init__`:

* Instantiate your search tree
* Load your hardcoded ML model

If you wish to experiment with various hyperparameters, you can do so via `kwargs`. These default values will not be changed by our programme.

Example:

```py
class StudentAgent:
    def __init__(self, depth=1):
        # Set the depth of your search
        pass

# In your programme, you can call it with a different value of `depth`
student_agent = StudentAgent(depth=2)

# In our programme, the default value of `depth` will be used
student_agent = StudentAgent()
```

#### Other functions

You are free to add any other helper functions within your class.

### Implementation Notes

* The student agent always plays as **Player 1** filling the board with 1’s. Being player 1 does not imply that the student always gets to be the first mover in the game.
* <strong style="color: red">You are not allowed to change the 4D array representation of the board into something more efficient</strong> (for example, bit board representation)
* A game-play framework against a naive agent making random valid moves has been provided below in this notebook. You can test out your developed agent here to ensure code correctness.
* All move validations are managed by the framework.
* <strong style="color: red">Make sure that your agent's `__init__` function runs within 1 second</strong>. If your constructor does not run within time limit, the test case fails.
* <strong style="color: red">Always ensure your move is returned within 3 seconds</strong>. You should check for elapsed time during recursive calls in your minimax algorithm. If your agent does not return a move within 3 seconds, the programme makes a random move on your behalf.

### Creating Good Agents

1. **Evaluation Function Design**: Develop an evaluation function that accurately assesses the board state. Consider incorporating features such as material advantage, control of key regions, potential to form winning lines, and possible opponent threats. Experiment with different weights and heuristics to balance offense and defense.
2. **Search Strategy Optimization**: Enhance your search strategy by employing techniques like iterative deepening, effective move ordering, and alpha-beta pruning. By exploring the most promising moves first and pruning branches that cannot possibly affect the final decision, you can search deeper within the limited time.
3. **Machine Learning Integration**: Explore the integration of machine learning techniques to improve your agent’s performance. You are highly encouraged to train a machine learning model to predict/evaluate the utility of a state or to predict the best move. This could help guide the search towards promising states and away from less relevant ones. By using machine learning to learn a more sophisticated evaluation function/action predictor, your agent can better prioritize moves that are likely to lead to a successful outcome, improving the efficiency and effectiveness of the search process. You can use our provided dataset as well for training; but you are encouraged to create a more rigorous dataset yourself if possible.

In [None]:
class StudentAgent:
    def __init__(self):
        """Instantiates your agent.
        """

    def choose_action(self, state: State) -> Action:
        """Returns a valid action to be played on the board.
        Assuming that you are filling in the board with number 1.

        Parameters
        ---------------
        state: The board to make a move on.
        """
        return state.get_random_valid_action()

## Testing Your Game Playing Agent

Each test case on Coursemology runs as follows:

1. The programme calls the constructor of the test agent and your `StudentAgent`.
2. The programme initiates a new `State` of the starting board of the game.
3. At each move, based on the state's `fill_num`, the programme calls the test agent to choose moves, or your agent's `choose_action`. It passes a copy of the current board to your function call `choose_action(state)`.
4. The programme then checks for the validity of the move, checks for timeout of each agent, replaces each agent's move with a random move if timeout is met or move returned is invalid, and applies the move to the current state.
5. Repeat steps 3 - 4 until the state is terminal.
6. The programme then decides the winner of the game.

#### Test One Move

You may run the test cases below to check whether your agent has made a valid move.

However, it is not a must to pass these test cases, we do not test these on Coursemology. If you decide to reuse search trees, these test cases may not be applicable. Your agent will only be tested in a full game starting from an empty board.

In [None]:
state = State(
    board=np.array([
        [
            [[1, 0, 2], [0, 1, 0], [0, 0, 1]],
            [[2, 0, 0], [0, 0, 0], [0, 0, 0]],
            [[0, 1, 0], [0, 0, 0], [0, 0, 0]],
        ],
        [
            [[0, 0, 0], [0, 0, 0], [0, 0, 0]],
            [[2, 0, 0], [0, 0, 0], [0, 0, 0]],
            [[0, 0, 0], [0, 0, 0], [0, 0, 0]],
        ],
        [
            [[0, 0, 0], [0, 0, 0], [0, 0, 0]],
            [[0, 0, 0], [0, 0, 0], [0, 0, 0]],
            [[0, 2, 0], [0, 0, 0], [0, 0, 0]],
        ],
    ]),
    fill_num=1,
    prev_action=(2, 2, 0, 1),
)
start_time = time.time()
student_agent = StudentAgent()
constructor_time = time.time()
action = student_agent.choose_action(state)
end_time = time.time()
assert state.is_valid_action(action)
print(f"Constructor time: {constructor_time - start_time}")
print(f"Action time: {end_time - constructor_time}")
assert constructor_time - start_time < 1
assert end_time - constructor_time < 3

state = State(
    board=np.array([
        [
            [[1, 0, 2], [0, 1, 0], [0, 0, 1]],
            [[2, 0, 0], [0, 0, 0], [0, 0, 0]],
            [[0, 1, 0], [0, 0, 0], [0, 0, 0]],
        ],
        [
            [[0, 0, 0], [0, 0, 0], [0, 0, 0]],
            [[2, 0, 0], [0, 0, 0], [0, 0, 0]],
            [[0, 0, 0], [0, 0, 0], [0, 0, 0]],
        ],
        [
            [[0, 0, 0], [0, 0, 0], [0, 0, 0]],
            [[0, 0, 0], [0, 0, 0], [0, 0, 0]],
            [[2, 0, 0], [0, 0, 0], [0, 0, 0]],
        ],
    ]),
    fill_num=1,
    prev_action=(2, 2, 0, 0)
)
start_time = time.time()
student_agent = StudentAgent()
constructor_time = time.time()
action = student_agent.choose_action(state)
end_time = time.time()
assert state.is_valid_action(action)
print(f"Constructor time: {constructor_time - start_time}")
print(f"Action time: {end_time - constructor_time}")
assert constructor_time - start_time < 1
assert end_time - constructor_time < 3

#### Test a Full Game Loop

You may run the test cases below to test your agent in a full game loop. You may replace the random agent below with your past agents, to test your new agents. Coursemology test cases will test in a similar manner, so it is important that your agent returns valid moves before timeouts in this test case.

In [None]:
# Use this cell to test your agent in two full games against a random agent.
# The random agent will choose actions randomly among the valid actions.

class RandomStudentAgent(StudentAgent):
    def choose_action(self, state: State) -> Action:
        return state.get_random_valid_action()

def run(your_agent: StudentAgent, random_agent: StudentAgent, start_num: int):
    timeout_count = 0
    invalid_count = 0
    state = State(fill_num=start_num)
    while not state.is_terminal():
        random_action = state.get_random_valid_action()
        if state.fill_num == 2:
            action = random_agent.choose_action(state.clone())
        else:
            start_time = time.time()
            action = your_agent.choose_action(state.clone())
            end_time = time.time()
            if end_time - start_time > 3:
                print("Your agent time out!")
                timeout_count += 1
                action = random_action
        if not state.is_valid_action(action):
            assert state.fill_num == 1
            print("Your agent made an invalid action!")
            invalid_count += 1
            action = random_action
        state = state.change_state(action)
    if state.terminal_utility() == 1:
        print("You win!")
    elif state.terminal_utility() == 0:
        print("You lose!")
    else:
        print("Draw")
    print(f"Timeout count: {timeout_count}")
    print(f"Invalid count: {invalid_count}")

your_agent = StudentAgent()
random_agent = RandomStudentAgent()
run(your_agent, random_agent, 1)
run(your_agent, random_agent, 2)

## Resources & Hints

Run the cell below to import necessary libraries to run the demo code.

In [None]:
import numpy as np
import torch
import torch.nn as nn
from sklearn.linear_model import LinearRegression
from collections import OrderedDict
from torch import tensor
import time

### Machine Learning

In this project, we strongly encourage you to integrate machine learning techniques into your agent. The following resources may be helpful as you explore this approach.

* **PyTorch**: This will be introduced in a future problem set. Some videos and Colab notebook to get you started:

  * https://colab.research.google.com/drive/12nQiv6aZHXNuCfAAuTjJenDWKQbIt2Mz
  * https://youtu.be/ubSsUJbLkwM?si=SPXEkZYV7CS2ZffZ
  * https://youtu.be/dsNtkT7LF8M?si=UY00ChyxNU1UNJxj


* **Scikit-learn**:
  - https://youtu.be/0B5eIE_1vpU?si=RIu4tBcpyHiySC2D
  - https://scikit-learn.org/1.4/tutorial/basic/tutorial.html

Here are some **machine learning models** you might consider (this list is not exhaustive):

- Linear Regression
- Logistic Regression
- Decision Tree
- Support Vector Machine (SVM)
- Neural Networks

You can also experiment with various **loss functions** (again, this list is not exhaustive):

- Mean Squared Error (MSE)
- Mean Absolute Error (MAE)
- Cross-Entropy Loss
- Ridge/Lasso Regularization

In addition to evaluating performance against your training dataset, be sure to measure how well your agent performs against previous agents you've developed. Even if your machine learning model doesn’t perfectly fit the data, it might still outperform earlier agents, making it a valuable step in improving your agent’s performance.

### External Computational Resources

For this mini-project, it is useful to have access to some computing resources to run your code and train machine learning models. Two options are Google Colaboratory and the School of Computing Compute Cluster.

#### [Google Collaboratory](https://colab.research.google.com/)

Google Collaboratory, or "Colab" for short, is a free cloud-based platform provided by Google that allows you to write and run Python code using a Jupyter notebook interface. Colab provides access to a virtual machine with a GPU and sometimes even a TPU, which can speed up computation for tasks like training machine learning models. You can use Colab on your own computer without installing any software, and it provides access to a number of libraries and datasets. However, there may be limits on how much time, memory, and storage space you can use, and you may need to reauthorize your session frequently. 

You may find the following video useful: https://www.youtube.com/watch?v=hZgykFahXrs .

#### [The School of Computing Compute Cluster](https://dochub.comp.nus.edu.sg/cf/guides/compute-cluster/)

The School of Computing Compute Cluster is a set of high-performance computing resources that are available to students, faculty, and researchers affiliated with the National University of Singapore's School of Computing. The cluster consists of multiple nodes, each with its own set of CPUs, memory, and storage. You can submit jobs to the cluster using the Slurm workload manager, which allocates resources to jobs based on availability and user-specified requirements. The Compute Cluster provides significantly more computing power than Colab, with the ability to scale up to hundreds or even thousands of cores. However, you need to apply for access to the cluster, and there may be limits on the amount of resources that can be used at any given time. Additionally, using the cluster requires some technical expertise and familiarity with the Linux command line interface. Log in with your NUS account and follow the guides here: https://dochub.comp.nus.edu.sg/cf/guides/compute-cluster/start

Quick links:
- Accessing the Cluster: https://dochub.comp.nus.edu.sg/cf/guides/compute-cluster/access
- Slurm Cluster Information: https://dochub.comp.nus.edu.sg/cf/guides/compute-cluster/start#slurm-cluster-information
- Compute Cluster Hardware Configuration: https://dochub.comp.nus.edu.sg/cf/guides/compute-cluster/start#compute-cluster-hardware-configuration
- Slurm Quick Start Guide: https://dochub.comp.nus.edu.sg/cf/guides/compute-cluster/start#slurm-quick-start-guide

If you prefer not to use Google Colaboratory or the School of Computing Compute Cluster, you can also run your code on your own computer. However, keep in mind that your computer may not have as much processing power or memory as the other options, so your code may run more slowly and you will take more time to complete the task.

### Hardcoding Your Trained ML Model

Since **Coursemology** does not support file uploads, if you use machine learning in your agent, you will need to **hardcode** your trained machine learning model.

Suppose that you want to create a machine learning model based on the following data.

In [None]:
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.array([1, 8, 2, 4])

**PyTorch**

Assume you have the following neural networks:

In [None]:
net = nn.Linear(2, 1)

We can then train the neural networks as follows.

In [None]:
def train(net: nn.Module, X: np.ndarray, y: np.ndarray):
    X = torch.tensor(X, dtype=torch.float32)
    y = torch.tensor(y[:, None], dtype=torch.float32)
    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(net.parameters(), lr=0.01)
    for epoch in range(100):
        optimizer.zero_grad()
        output = net(X)
        loss = criterion(output, y)
        loss.backward()
        optimizer.step()

train(net, X, y)

You can print the weights of the model:

In [None]:
# Set precision of printing numbers to 10 d.p.
torch.set_printoptions(precision=10)
print(net.state_dict())

The above cell will output something like this:

```
OrderedDict([('weight', tensor([[0.1491376609, 1.3386186361]])), ('bias', tensor([0.6359716654]))])
```

Notes: 

- You may obtain different numerical values.
- `OrderedDict` is a class that can be imported from `collections` library, and `tensor` is simply a `torch` tensor. You need to import the respective classes to hard code the model.

To hardcode the weights of the model, you can do the following.

In [None]:
# declare the same model
model = nn.Linear(2, 1)

# Hardcode the weights from the previous step
coeffs = OrderedDict([('weight', tensor([[0.1491376609, 1.3386186361]])), ('bias', tensor([0.6359716654]))])

# Load the model with the hardcoded weights
model.load_state_dict(coeffs)

# The model now has the same weights as it was previously trained
X_tensor = torch.tensor(X, dtype=torch.float32)
assert torch.allclose(model(X_tensor), net(X_tensor))

**Scikit-Learn**

Assume that you trained a linear regression model in sklearn with some data:

In [None]:
reg = LinearRegression().fit(X, y)
reg.score(X, y)

We can get the parameters of the model as follows.

In [None]:
# Set the precision of printing numbers to 10 d.p.
np.set_printoptions(precision=10)

# Print the coefficients of the model
print(reg.coef_)
print(reg.intercept_)

The above cell should have an output similar to the following:

```
[-6.   4.5]
3.75
```

We can then hardcode the coefficients in our code:

In [None]:
model = LinearRegression()
model.coef_ = np.array([-6, 4.5])
model.intercept_ = 3.75

assert np.allclose(model.predict(X), reg.predict(X))