In [2]:
from CoderSchoolAI.Environment.CoderSchoolEnvironments.SnakeEnvironment import *
from CoderSchoolAI.Neural.Blocks import *
from CoderSchoolAI.Neural.Net import *
import torch as th

ImportError: cannot import name 'QLearning' from 'CoderSchoolAI.Training.Algorithms' (c:\Users\John\.conda\envs\pytorch\lib\site-packages\CoderSchoolAI\Training\Algorithms.py)

# Basics of Neural Networks for the Snake Game

This tutorial aims to cover the essentials of building a neural network to play the snake game. The main points are: understanding neural networks, understanding the data, and using the CoderSchoolAI.Neural library.

## Neural Networks

Neural Networks are a set of algorithms modeled after the human brain. They are designed to recognize patterns, which makes them useful in machine learning and artificial intelligence applications.

In the context of our Snake game, we'll be building a neural network to make decisions based on the game's current state. Given the current state, our network will be responsible for deciding which move (up, down, left, right) will maximize the chances of winning the game.

## Understanding the Data

For the Snake game, our data comes in the form of the game state, actions, and reward:

1. **State**: The state of the game is like a snapshot of what the game looks like right now. In Snake, the state might be the position of the snake and the position of the apple.

2. **Actions**: Actions are the things you can do in the game. In Snake, the actions would be moving up, down, left, or right.

3. **Reward**: The reward is what the game gets for performing an action. In Snake, the reward might be the points received for eating an apple or a penalty for crashing into a wall or the snake itself.

## CoderSchoolAI.Neural Library

This library provides tools to easily build and manipulate deep neural networks. We will be using the Net class to build our neural network.

Here is a brief overview of the classes and methods provided by the library:

### Net Class

This is the main class that will be used to build the network structure. It has the following methods:

- `__init__(self, network_name:str= "basic_net", is_dict_network=False, device: th.device = th.device('cpu'))`: This initializes the Net. It can be named for organization, declared as a dictionary network, and assigned a specific device for computation.

- `add_block(self, block: Block)`: Adds a Block connection to the network. The type of Block is based on the input.

- `compile(self)`: Once all blocks have been added, this function will compile the network and prepare it for training.

- `forward(self, x: th.Tensor)`: This function is called during training to pass the input data through the network.

- `copy(self)`: This function copies the current network, useful for creating multiple instances of the same network.

### InputBlock Class

The InputBlock is the first layer of the network, it takes in the input data.

- `__init__(self, in_attribute:Union[Attribute, Dict[str, Attribute]], is_module_dict:bool=False, device: th.device = th.device('cpu'))`: Initializes the InputBlock with attributes, declares if it is a module dictionary, and assigns the computational device.

- `forward(self, x) -> th.Tensor`: This function is called during training to pass the input data through the block.

### LinearBlock Class

The LinearBlock is a fully connected layer of the network (MLP).

- `__init__(self, input_size: int, output_size: int, num_hidden_layers: int = 3, hidden_size: Union[int, List[int]] = 128, activation: Optional[Callable] = None, device: th.device = th.device('cpu'))`: Initializes the LinearBlock with the input size, output size, number of hidden layers, size of hidden layers, an optional activation function, and the device for computation.

- `forward(self, x: th.Tensor) -> th.Tensor`: This function is called during training to pass the input data through the block.

### ConvBlock Class

The ConvBlock class is for building convolutional neural network (CNN) architectures.

Below is an explanation of the main methods within this class:

- `__init__(self, input_shape: int, num_channels: int, depth: int = 3, disable_max_pool: bool = False, desired_output_size: int = None, activation: Optional[Callable] = None, device: th.device = th.device('cpu'))`: Initializes the ConvBlock with various parameters including input shape, number of channels, depth (i.e. the number of convolutional layers), and optional parameters such as max pooling disablement, desired output size, and activation function. The computational device can also be set.

- `forward(self, x) -> th.Tensor`: This function is called during training to pass the input data through the block. It first checks the input shape to ensure it matches the expected input shape for the ConvBlock, and raises an error if the shape doesn't match. It then passes the input through the block's module (which is a sequence of convolutional, activation and pooling layers) to generate the output. If the block has forward connections, it calls the forward method on the next block in the sequence.

### OutputBlock Class

The OutputBlock is the final layer of the network which outputs the results.

- `__init__(self, input_size, num_classes, device: th.device = th.device('cpu'))`: Initializes the OutputBlock some specified input size to output some number of classes.

- `forward(self, x: th.Tensor) -> th.Tensor`: This function is called during training to pass the input data through the block.

## Putting it all Together

To put all of this together for our Snake game, we will need to create a neural network that can take the game's current state and output a decision for the next action. 

The basic steps are as follows:

1. **Create InputBlock**: We'll need an InputBlock to take in our game state. The state can be an array containing information about the snake's position, the apple's position, and any other relevant game information.

```python
    snake_env = SnakeEnv(width=16, height=16)
    input_block = InputBlock(in_attribute=snake_env.snake_agent.get_q_table_state(), is_module_dict=False)
```

2. **Create Hidden Layers (LinearBlocks)**: These layers will do the computation based on our input. The exact number and size of these layers can vary depending on the complexity of the game and the sophistication of the strategy you want to develop.

```python
    snake_env = SnakeEnv(width=16, height=16)
    input_block = InputBlock(in_attribute=snake_env.snake_agent.get_q_table_state(), is_module_dict=False)
    lin_block = LinearBlock(input_size=len(snake_env.snake_agent.get_q_table_state()), num_hidden_layers=3, hidden_size=128, dropout=0.05)
```

3. **Create Image Filter Blocks (ConvBlocks)**: This is a basic architecture commonly used for any image processing tasks. Our SnakeEnv has a GameState which just so happends to be an Image!

```python
    snake_env = SnakeEnv(width=16, height=16)
    input_block = InputBlock(in_attribute=snake_env.get_attribute("game_state"), is_module_dict=False)
    conv_block = ConvBlock(input_shape=input_block.in_attribute.space.shape,num_channels=1,depth=3)
```

3. **Create OutputBlock**: Finally, we need an OutputBlock that will output the decision for the next move. This will usually be a single value indicating the direction to move (up, down, left, right).

```python
    # Output Example with ConvBlocks!
    snake_env = SnakeEnv(width=16, height=16)
    input_block = InputBlock(in_attribute=snake_env.get_attribute("game_state"), is_module_dict=False)
    conv_block = ConvBlock(input_shape=input_block.in_attribute.space.shape,num_channels=1,depth=3)
    OutputBlock(input_size=conv_block.output_size, num_classes=len(snake_env.snake_agent.get_actions()))
```

```python
    # Output Example with LinearBlocks!
    snake_env = SnakeEnv(width=16, height=16)
    input_block = InputBlock(in_attribute=snake_env.snake_agent.get_q_table_state(), is_module_dict=False)
    lin_block = LinearBlock(input_size=len(snake_env.snake_agent.get_q_table_state()), num_hidden_layers=3, hidden_size=128, dropout=0.05)
    OutputBlock(input_size=lin_block.output_size, num_classes=len(snake_env.snake_agent.get_actions()))
```

4. **Compile the Network**: Once all blocks have been added, compile the network with `compile()` method of the Net class.

```python
    # Build the Network from Blocks
    net = Net()
    net.add_block(input_block)
    net.add_block(conv_block)
    net.add_block(out_block)

    # Compile the Network!
    net.compile()
```


5. **Train the Network**: After compiling the network, we'll need to train it. This involves playing the game many times and adjusting the weights of the network based on the outcomes. This process is usually facilitated by reinforcement learning algorithms.


    `Think about training a computer to play a video game. It's a lot like teaching a dog a new trick: we want the computer to learn how to do something really well.`

- **The Game**: `In the game, our computer (which we'll call the 'agent') is going to play   many rounds (or 'episodes') of the game. In each round, it will try different things to see what works.`

- **Deep Q-Learning**:` We're using a special training method called 'Deep Q-Learning.'     This is a set of rules that tells the agent how to play and learn from the game. It's kind  of like the rule book for the agent.`

- **Epsilon-Greedy Strategy**: `To start, the agent mostly guesses what to do (we call  these 'actions'). But, over time, it starts to learn which actions work best. It still   guesses sometimes, though, because that could lead to even better actions. We call this the 'Epsilon-Greedy Strategy.' `

- **Memory**: `Just like how you remember your best games, the agent also stores its    experiences (the game state, its actions, and rewards) in its memory. This helps it learn  over time.`

- **Learning**:` After each episode, the agent takes a look at its memory and learns from   it. It then adjusts its playing strategy based on what it learned.`

- **Playing the Game**: `The agent keeps playing the game, adjusting its strategy, and  filling its memory. It does this for a lot of episodes, and with each episode, it should   get a little better!`


In [None]:
"""
This is a great example, and indeed shows the use of the `CoderSchoolAI.Neural` library in a concise manner. I'll extend this example with more comments for additional clarity:
"""

# Import necessary modules
from CoderSchoolAI.Environment.CoderSchoolEnvironments.SnakeEnvironment import *
from CoderSchoolAI.Neural.Blocks import *
from CoderSchoolAI.Neural.Net import *
import torch as th

# Initialize the game environment with a 16x16 grid
snake_env = SnakeEnv(width=8, height=8)

# Define the InputBlock. 
# We use the game state attribute from the environment as the input to our network.
input_block = InputBlock(in_attribute=snake_env.get_attribute("game_state"), is_module_dict=False)

# Define the ConvBlock which acts as a convolutional layer for processing the game state.
# The depth of 3 represents the number of convolutional layers in this block.
conv_block = ConvBlock(input_shape=input_block.in_attribute.space.shape, num_channels=1, depth=3)

# Define the OutputBlock that will decide the next action to take based on the current game state.
# The num_classes corresponds to the number of possible actions the snake can take (up, down, left, right).
# out_block = OutputBlock(input_size=conv_block.output_size, num_classes=len(snake_env.snake_agent.get_actions()))

# Initialize the network and add the blocks
net = Net()
net.add_block(input_block)
net.add_block(conv_block)
# net.add_block(out_block)

# Compile the network
net.compile()

# Test the network
# We get a sample game state and feed it through the network.
input_sample = snake_env.get_attribute("game_state").sample()
net.eval()
output_test = net(input_sample)

# Test network copying
# We create a copy of the network and test it with the same input sample.
# This is to verify that the copying operation works correctly.
copy_of_net = net.copy()
copy_of_net.eval()
output_copy_test = copy_of_net(input_sample)
# net.compare_networks(copy_of_net)

th.testing.assert_close(output_test, output_copy_test, rtol=1e-05, atol=1e-08)

"""
Note:
The performance of the neural network would depend on the exact architecture (number and type of blocks), as well as the training process, which has not been covered in this code snippet. 
"""

Compiling Network...
Comiled Block:  CoderSchoolAI.Neural.Blocks.InputBlock.InputBlock>,

Comiled Block:  CoderSchoolAI.Neural.Blocks.ConvBlock.ConvBlock>,
  (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (1): ReLU()
  (2): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (3): ReLU()
  (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (5): ReLU()
  (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (7): Flatten(start_dim=1, end_dim=-1)

Compiling Network...
Comiled Block:  CoderSchoolAI.Neural.Blocks.InputBlock.InputBlock>,

Comiled Block:  CoderSchoolAI.Neural.Blocks.ConvBlock.ConvBlock>,
  (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (1): ReLU()
  (2): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (3): ReLU()
  (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (5): ReLU()
  (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation

'\nNote:\nThe performance of the neural network would depend on the exact architecture (number and type of blocks), as well as the training process, which has not been covered in this code snippet. \n'

In [1]:
### How do We use this in Our Case?
"""
Here is the Basic template for Building a Neural Network for our Snake!
"""
from CoderSchoolAI.Environment.CoderSchoolEnvironments.SnakeEnvironment import *
from CoderSchoolAI.Environment.CoderSchoolEnvironments.FrozenLakeEnvironment import *

from CoderSchoolAI.Neural.Blocks import *
from CoderSchoolAI.Neural.Net import *
from CoderSchoolAI.Neural.ActorCritic.ActorCriticNetwork import *
import torch as th

# To use attribute we must make a copy to avoid issues saving the data
# TODO: fix this copying issue

snake_env = SnakeEnv(width=8, height=8)
# frozen_lake_env = FrozenLakeEnv()
input_block = InputBlock(in_attribute=snake_env.get_attribute("game_state").copy(), is_module_dict=False,)
conv_block = ConvBlock(input_shape=input_block.in_attribute.space.shape,num_channels=1,depth=5,)
flatten_size = np.prod(snake_env.get_attribute("game_state").shape)
flat_block = FlattenBlock(conv_block.output_size)
lin_block = LinearBlock(flat_block.output_size, 16, num_hidden_layers=3, dropout=0.05, hidden_size=[128, 128, 32])

out_block = OutputBlock(input_size=lin_block.output_size, num_classes=len(snake_env.snake_agent.get_actions()),)

q_net = Net(name='test_q_net_with_game_state')
q_net.add_block(input_block)
q_net.add_block(conv_block)
q_net.add_block(flat_block)
q_net.add_block(lin_block)
q_net.add_block(out_block)
q_net.compile()

copy_net = q_net.copy()


CoderSchoolAI v0.0.9: 
CoderSchoolAI: A Python Module designed for teaching Modern Day AI to Kids.

This package includes a range of educational tools and templates designed to simplify complex concepts and offer improved learning opportunities. 
It enables the exploration of foundational principles of problem-solving through Artificial Intelligence, providing structured guidance for the exploration and 
implementation of theoretical concepts.

Key Features:
 - Learning Curriculum: Our module makes learning programming engaging and fun, turning complex ideas into digestible chunks.
 
 - Educational Tools and Templates: We provide tools and templates to simplify complex concepts and enhance learning opportunities.
 
 - Exploration of Foundational Concepts: Our module enables the exploration of foundational principles of problem-solving through Artificial Intelligence.
 - Structural Guidance: We offer structured guidance for exploring and implementing theoretical concepts in a hands-on w

  from .autonotebook import tqdm as notebook_tqdm


Compiling Network...
Comiled Block:  CoderSchoolAI.Neural.Blocks.InputBlock.InputBlock>,

Comiled Block:  CoderSchoolAI.Neural.Blocks.ConvBlock.ConvBlock>,
  (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (1): ReLU()
  (2): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (3): ReLU()
  (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (5): ReLU()
  (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (7): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (8): ReLU()
  (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (10): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (11): ReLU()
  (12): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (13): Flatten(start_dim=1, end_dim=-1)

Comiled Block:  CoderSchoolAI.Neural.Blocks.FlattenBlock.FlattenBlock>,

Comiled Block:  CoderSchoolAI.Neural.Blocks.LinearBlock.L


`deep_q_learning` function:

- **agent**: This is our player or the character in the game. 

- **environment**: This is the game world or the level where our character is playing.

- **q_network**: Think of this as the brain of our character. It's what helps the character make decisions based on what it has learned.

- **target_q_network**: This is a copy of the character's brain, but we only update it from time to time. We use it to help the character make more stable decisions.

- **buffer**: This is the character's memory. It stores what the character has seen, what it did, and what happened.

- **num_episodes**: This is how many rounds of the game the character will play. More rounds mean more practice!

- **max_steps_per_episode**: This is the maximum number of moves the character can make in one round. If a round takes too long, we stop it.

- **gamma**: This is a measure of how much our character cares about future rewards compared to immediate ones. A high value makes the character care more about long-term results.

- **update_target_every**: This tells us how often we update the copy of the character's brain. For example, if it's set to 10, we update it after every 10 rounds.

- **batch_size**: This is how many memories we use when teaching the character from its past experiences.

- **epsilon**: This is how often the character makes random moves. At first, we want the character to explore a lot and try different things, so we set this high.

- **epsilon_decay**: This is how quickly we reduce the amount of randomness in the character's actions. As the character learns more, we want it to rely less on random moves.

- **stop_epsilon**: This is the lowest amount of randomness we allow in the character's actions. We always want it to try something new from time to time!

- **alpha**: This is how quickly the character learns. A high value means the character learns quickly but might miss some details. A low value means slower learning, but more attention to details.

- **attributes**: These are extra details or features that the character can use to make decisions. For example, it could be the character's health or number of lives in the game.

- **optimizer**: This is a tool that helps the character's brain learn from its mistakes and get better.

- **fps**: This is how fast the game runs. A higher number means the game runs faster.

Playing around with these parameters changes how the character learns and plays the game. So go ahead and tweak them to see what happens!

In [3]:
### Running the Training ###
"""
Remember:
- We can edit parameters for the Learning Algorithm!
"""
from CoderSchoolAI.Training.Algorithms import deep_q_learning
from CoderSchoolAI.Environment.Agent import BasicReplayBuffer, DictReplayBuffer
batch_size = 512


deep_q_learning(
    agent=snake_env.snake_agent,
    environment=snake_env,
    q_network=copy_net,
    target_q_network=q_net,
    update_target_every=5,
    buffer= BasicReplayBuffer(batch_size),
    num_episodes=10000,
    max_steps_per_episode=100,
    epsilon=0.5,
    epsilon_decay=0.9999,
    stop_epsilon=0.1,
    batch_size=batch_size,
    reward_norm_coef=0.5,
    reward_normalization=False,
    alpha=0.005,
    attributes="game_state",
    log_frequency=1,
)

Episode: 2, Avg Reward: -1.2412474921943775, Epsilon: 0.49990000500000004
Episode: 3, Avg Reward: -4.583650943809992, Epsilon: 0.49975004999500033
Episode: 4, Avg Reward: -1.0645094027447297, Epsilon: 0.49970007499000085
Episode: 5, Avg Reward: -4.271226562211863, Epsilon: 0.49950022494001056
Episode: 6, Avg Reward: -5.212557579351433, Epsilon: 0.49940032989002486
Episode: 7, Avg Reward: -2.8714635306407854, Epsilon: 0.4993004548180502
Episode: 8, Avg Reward: -2.5, Epsilon: 0.49925052477256837
Episode: 9, Avg Reward: -2.5, Epsilon: 0.4992005997200911
Episode: 10, Avg Reward: -4.355826886154683, Epsilon: 0.49890115423036574
Episode: 11, Avg Reward: -3.8670048349966466, Epsilon: 0.49875149885063236
Episode: 12, Avg Reward: -1.2167401537143667, Epsilon: 0.4986517535383772
Episode: 13, Avg Reward: -3.895495128834866, Epsilon: 0.4985520281741871
Episode: 14, Avg Reward: -3.3064036983527396, Epsilon: 0.4985021729713697
Episode: 15, Avg Reward: -2.8322172467139155, Epsilon: 0.4984024775217971

In [3]:
### Saving the Network ####
file_path = "."
q_net.save_checkpoint()

In [4]:
### How Good Is your Model?
from CoderSchoolAI.Environment.Agent import *
from CoderSchoolAI.Environment.Shell import *
from CoderSchoolAI.Environment.CoderSchoolEnvironments.SnakeEnvironment import *
from CoderSchoolAI.Neural.Blocks import *
from CoderSchoolAI.Neural.Net import *
import torch as th

snake_env = SnakeEnv(width=8, height=8, cell_size=60, verbose=True, target_fps=8)
input_block = InputBlock(in_attribute=snake_env.get_attribute("game_state").copy(), is_module_dict=False,)
conv_block = ConvBlock(input_shape=input_block.in_attribute.space.shape,num_channels=1,depth=5,)
flatten_size = np.prod(snake_env.get_attribute("game_state").shape)
flat_block = FlattenBlock(conv_block.output_size)
lin_block = LinearBlock(flat_block.output_size, 16, num_hidden_layers=3, dropout=0.05, hidden_size=[128, 128, 32])

out_block = OutputBlock(input_size=lin_block.output_size, num_classes=len(snake_env.snake_agent.get_actions()),)

q_net = Net(name='test_q_net_with_game_state')
q_net.add_block(input_block)
q_net.add_block(conv_block)
q_net.add_block(flat_block)
q_net.add_block(lin_block)
q_net.add_block(out_block)
q_net.compile()
q_net.load_checkpoint("./test_q_net_with_game_state.pt")

def demo_model(
    agent: Agent,  # Actor in the Environment
    environment: Shell,  # Environment which the Deep Q Network is being trained on 
    q_network: Net,  # The pretrained Deep Q Network
    num_episodes: int = 10,  # Number of episodes to demonstrate
    max_steps_per_episode: int = 100,  # Stops the Epsiodic Sampling when the number of steps per episode reaches this value
    attributes: Union[str, Tuple[str]] = None,  # attributes to be used for the Network
    fps: int = 10,  # Frames per second to run the Agent
) -> None:
    """
    A demonstration of Deep Q Learning with a pretrained Deep Q Network.
    """

    # Ensure the Q-network is in evaluation mode
    q_network.eval()

    # Check if the action space is dictionary type
    if isinstance(agent.get_actions(), dict):
        raise ValueError("The action space for Deep Q Learning cannot be of type Dict.")

    for episode in range(1, num_episodes+1):
        environment.clock.tick(fps)
        state = environment.reset(attributes)
        done = False
        step = 0

        while not done and step < max_steps_per_episode:
            step += 1
            # Get list of possible actions from agent
            possible_actions = agent.get_actions()
            # Convert state to tensor for feeding into the network
            state_tensor = th.tensor(state, dtype=th.float32).to(q_network.device)
            state_tensor = th.unsqueeze(state_tensor, 0)
            # Feed the state into the q_network to get Q-values for each action
            with th.no_grad():  # Disable gradient computation
                q_values = q_network(state_tensor)
            # Choose action with highest Q-value
            _, action_index = th.max(q_values, dim=1)
            action = possible_actions[action_index.item()]
            # Take action in the environment
            next_state, reward, done = environment.step(action, 0, attributes)
            # Update the state
            state = next_state
            environment.render_env()

## Demo Function Call:
demo_model (
    agent=snake_env.snake_agent,
    environment=snake_env,
    q_network=q_net,
    num_episodes=100,
    max_steps_per_episode=100,
    attributes="game_state",
    fps=6,
)

Registered: game_state Attribute.
Registered: moving_direction Attribute.
Registered: apple_pos Attribute.
Registered: snake_pos Attribute.
Compiling Network...
Comiled Block:  CoderSchoolAI.Neural.Blocks.InputBlock.InputBlock>,

Comiled Block:  CoderSchoolAI.Neural.Blocks.ConvBlock.ConvBlock>,
  (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (1): ReLU()
  (2): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (3): ReLU()
  (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (5): ReLU()
  (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (7): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (8): ReLU()
  (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (10): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (11): ReLU()
  (12): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (13): Flatten(start_dim=1, end

In [None]:
import CoderSchoolAI

TODO: PPO Algorithm

In [1]:
### How do We use this in Our Case?
"""
On Policy: PPO Agents
"""
from CoderSchoolAI.Environment.CoderSchoolEnvironments.SnakeEnvironment import *
from CoderSchoolAI.Neural.Blocks import *
from CoderSchoolAI.Neural.Net import *
from CoderSchoolAI.Neural.ActorCritic.ActorCriticNetwork import *
import torch as th

snake_env = SnakeEnv(width=16, height=16)
input_block = InputBlock(in_attribute=snake_env.get_attribute("game_state"), is_module_dict=False,)
conv_block = ConvBlock(input_shape=input_block.in_attribute.space.shape,num_channels=1,depth=5,)
# out_block = OutputBlock(input_size=conv_block.output_size, num_classes=len(snake_env.snake_agent.get_actions()),)

ppo_net = Net(name='test_ppo_net_with_game_state')
ppo_net.add_block(input_block)
ppo_net.add_block(conv_block)
ppo_net.compile()

ActorCritic(snake_env.get_observation_space(), snake_env.get_action_space(),ppo_net, net_arch=[32, dict(vf=[32, 32], pi=[32, 32])],)

CoderSchoolAI v0.0.9: 
CoderSchoolAI: A Python Module designed for teaching Modern Day AI to Kids.

This package includes a range of educational tools and templates designed to simplify complex concepts and offer improved learning opportunities. 
It enables the exploration of foundational principles of problem-solving through Artificial Intelligence, providing structured guidance for the exploration and 
implementation of theoretical concepts.

Key Features:
 - Learning Curriculum: Our module makes learning programming engaging and fun, turning complex ideas into digestible chunks.
 
 - Educational Tools and Templates: We provide tools and templates to simplify complex concepts and enhance learning opportunities.
 
 - Exploration of Foundational Concepts: Our module enables the exploration of foundational principles of problem-solving through Artificial Intelligence.
 - Structural Guidance: We offer structured guidance for exploring and implementing theoretical concepts in a hands-on w

  from .autonotebook import tqdm as notebook_tqdm


Compiling Network...
Comiled Block:  CoderSchoolAI.Neural.Blocks.InputBlock.InputBlock>,

Comiled Block:  CoderSchoolAI.Neural.Blocks.ConvBlock.ConvBlock>,
  (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (1): ReLU()
  (2): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (3): ReLU()
  (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (5): ReLU()
  (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (7): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (8): ReLU()
  (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (10): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (11): ReLU()
  (12): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (13): Flatten(start_dim=1, end_dim=-1)



AttributeError: 'Net' object has no attribute '_features_dim'