# Snake game with AI

### This project is about making a snake game and using Reinforcement Learning to train a model to play

**For the random people who find this in github, the best practice to run it is to create a venv and install all packages from the requirements.txt.**
To do that, run:

```
python -m venv gameenv
gameenv\Scripts\activate
pip install -r requirements.txt
```

So... In order to make an AI model to play the game, first we need to code up the actual game.

For the game, the libraries we will need are:
```python
import pygame
import random
from enum import Enum
from collections import namedtuple
import numpy as np
```

The render font that I chose for the UI is arial.ttf, so we need to download that into the game folder.

Setting up the game we have the font, an enum class for Directions, a namedtuple we will be using to define each point in the game window, some color constants, block size we will be using for the objects in the game (snake body and food) and speed - which is the framerate.

**Helpers we have for the game:**
1. `arial.ttf` – font for game text
2. `Direction` (Enum) – movement directions
3. `Point` – named tuple for clean point management
4. Color constants – `WHITE`, `RED`, etc. for `pygame`
5. `BLOCK_SIZE` – the size of a single block (snake body / food)
6. `SPEED` – frames per second (FPS) of the game

**Game methods:**
1. `__init__` - the constructor, in which we specialize the display width and height
2. `_place_food()` - we place the food on a random place.
3. `play_step()` - we change the snake's direction based on user input
4. `_is_collision()` - we check if the snake bumped into something
5. `_move()` - we apply the user input direction
6. `_update_ui()` - we display and update the UI

### How the game works

First, we start by placing the snake horizontally and placing the food on a random place. The snake's head is at the center of the screen with it's body going to the left by the _x_ axis, and the default starting movement being right _relative to the screen_.

Then we start moving. The way we simulate movement is simple. We don't move each part of the snake each frame, instead, every step, we place a new head in the direction that the player specified and remove the tail if there was not food at that place. If there is food there, we don't remov the head and continue the loop. When we repeat that process really fast it looks like the whole snake is moving, but instead... we simply add a head and cut the tail really fast.

**Note - movement:** Game libraries don't use a normal coordinate system like in math. In games, in order to simulate monitor pixels, _x_ grows to the right, but _y_ grows downwards, not upwards.

Each step/frame we check if there is a collision and stop the game if there is.

And of course we update the UI each frame too.

In [None]:
import pygame
import random
from enum import Enum
from collections import namedtuple

pygame.init()
font = pygame.font.Font('arial.ttf', 25)

class Direction(Enum):
    RIGHT = 1
    LEFT = 2
    UP = 3
    DOWN = 4
    
Point = namedtuple('Point', 'x, y')

# rgb colors
WHITE = (255, 255, 255)
RED = (200,0,0)
BLUE1 = (0, 0, 255)
BLUE2 = (0, 100, 255)
BLACK = (0,0,0)

BLOCK_SIZE = 20
SPEED = 10

class SnakeGame:
    
    def __init__(self, w=640, h=480):
        self.w = w
        self.h = h
        # init display
        self.display = pygame.display.set_mode((self.w, self.h))
        pygame.display.set_caption('Snake')
        self.clock = pygame.time.Clock()
        
        # init game state
        self.direction = Direction.RIGHT
        
        self.head = Point(self.w/2, self.h/2)
        self.snake = [self.head, 
                      Point(self.head.x-BLOCK_SIZE, self.head.y),
                      Point(self.head.x-(2*BLOCK_SIZE), self.head.y)]
        
        self.score = 0
        self.food = None
        self._place_food()
        
    def _place_food(self):
        x = random.randint(0, (self.w-BLOCK_SIZE )//BLOCK_SIZE )*BLOCK_SIZE 
        y = random.randint(0, (self.h-BLOCK_SIZE )//BLOCK_SIZE )*BLOCK_SIZE
        self.food = Point(x, y)
        if self.food in self.snake:
            self._place_food()
        
    def play_step(self):
        # 1. collect user input
        for event in pygame.event.get():
            if event.type == pygame.QUIT:
                pygame.quit()
                quit()
            if event.type == pygame.KEYDOWN:
                if event.key == pygame.K_LEFT:
                    self.direction = Direction.LEFT
                elif event.key == pygame.K_RIGHT:
                    self.direction = Direction.RIGHT
                elif event.key == pygame.K_UP:
                    self.direction = Direction.UP
                elif event.key == pygame.K_DOWN:
                    self.direction = Direction.DOWN
        
        # 2. move
        self._move(self.direction) # update the head
        self.snake.insert(0, self.head)
        
        # 3. check if game over
        game_over = False
        if self._is_collision():
            game_over = True
            return game_over, self.score
            
        # 4. place new food or just move
        if self.head == self.food:
            self.score += 1
            self._place_food()
        else:
            self.snake.pop()
        
        # 5. update ui and clock
        self._update_ui()
        self.clock.tick(SPEED)
        # 6. return game over and score
        return game_over, self.score
    
    def _is_collision(self):
        # hits boundary
        if self.head.x > self.w - BLOCK_SIZE or self.head.x < 0 or self.head.y > self.h - BLOCK_SIZE or self.head.y < 0:
            return True
        # hits itself
        if self.head in self.snake[1:]:
            return True
        
        return False
        
    def _update_ui(self):
        self.display.fill(BLACK)
        
        for pt in self.snake:
            pygame.draw.rect(self.display, BLUE1, pygame.Rect(pt.x, pt.y, BLOCK_SIZE, BLOCK_SIZE))
            pygame.draw.rect(self.display, BLUE2, pygame.Rect(pt.x+4, pt.y+4, 12, 12))
            
        pygame.draw.rect(self.display, RED, pygame.Rect(self.food.x, self.food.y, BLOCK_SIZE, BLOCK_SIZE))
        
        text = font.render("Score: " + str(self.score), True, WHITE)
        self.display.blit(text, [0, 0])
        pygame.display.flip()
        
    def _move(self, direction):
        x = self.head.x
        y = self.head.y
        if direction == Direction.RIGHT:
            x += BLOCK_SIZE
        elif direction == Direction.LEFT:
            x -= BLOCK_SIZE
        elif direction == Direction.DOWN:
            y += BLOCK_SIZE
        elif direction == Direction.UP:
            y -= BLOCK_SIZE
            
        self.head = Point(x, y)

game = SnakeGame()
    
# game loop
while True:
    game_over, score = game.play_step()
        
    if game_over == True:
        break
        
print('Final Score', score)

pygame.quit()


### Next Steps: Hooking up an AI

Now that the game mechanics are coded, the next step is to build an agent that plays the game using **Reubfircement Learning (RL)**

In order for that to happen, first we need to extend the `play_step()` and `move()` methods, and then we need to remove the `__name__ == __main__` check at the end as we won't be starting up the game directly anymore.

#### Modifying the `play_step()`:

This is where the **Reinforcement Learning** magic happens.

Reinforcement learning is simple at its core. During training, we define rewards and penalties. The model is rewarded for good actions and penalized for bad ones. It learns through feedback:

``I ran into a wall and got a -10 reward --- I shouldn't do that.``

``I ate the food and got +10 --- eating food is good!``

Since the agent will now decide the snake’s direction, we add a second parameter to ``play_step()``: an ``action``, represented as a vector of type ``[float, float, float]`` (more on that later). This replaces keyboard input — we no longer listen to ``KEYDOWN`` events from the user.

Now... back to the sauce. We calculate the reward based on a couple of things. The main ones are obviously food and death. We reward the model 10 points for eating foor and -10 for dying.
But as you can guess, since the display width is 640p, height is 480p and block size is 20x20, that means the probability of the snake getting the food is: $$\frac{640}{20} \times \frac{480}{20} = 32 \times 24 = 768 \text{ total cells}$$ Meaning each move our change of getting food is: $$ \frac{1}{768} \approx 0.0013 \text{ or } 0.13\% $$

#### So what do we do?

**Well... we simply need to calculate the reward better and more precisely. We introduce _immediate rewards_:**
1. Since the agent will now decide the snake’s direction, we add a second parameter to play_step(): an action, represented as a vector of type [float, float, float] (more on that later). This replaces keyboard input — we no longer listen to KEYDOWN events from the user.
    1. If the new distance is shorter, we give **+1 reward**.
    2. If it’s **greater**, we apply a **−0.5** penalty.
2. We check for body parts in 8 surrounding directions (excluding the neck).
    1. For each nearby segment, we apply **−0.1**.
    2. This encourages the snake to avoid trapping itself.
3. If the snake loops around for too long, we penalize it using:
    $$ Penalty = 100 * len(snake) $$

#### Modifying the `_move()`:

Movement logic also needs to change. Instead of using fixed directions like UP or LEFT, we now handle **relative directions** based on the snake’s current heading.

Imagine the directions as a clock:
`[RIGHT, DOWN, LEFT, UP]`

If a snake is heading right:
  1. A **right** turn = DOWN
  2. A **left** turn = UP
  3. **Straight** = continue RIGHT

Since the model doesn’t choose absolute directions, it outputs a 3-element array:
 1. `[1, 0, 0]` = go straight
 2. `[0, 1, 0]` = go right
 3. `[0, 0, 1]` = go left

 We translate it into a new direction like so:

```python
# [[1, 0, 0], [0, 1, 0], [0, 0, 1]] = [straight, right, left]

        clock_wise = [Direction.RIGHT, Direction.DOWN, Direction.LEFT, Direction.UP]
        index = clock_wise.index(self.direction)

        if np.array_equal(action, [1, 0, 0]):
            new_direction = clock_wise[index] # straight - no change
        elif np.array_equal(action, [0, 1, 0]):
            next_index = (index + 1) % 4
            new_direction = clock_wise[next_index] # turn right
        else: # [0, 0, 1] = left
            next_index = (index - 1) % 4
            new_direction = clock_wise[next_index] # turn left

        self.direction = new_direction
```

This logic ensures that the snake moves relative to its current direction — just like how a real animal (or robot) might navigate.


In [None]:
import pygame
import random
from enum import Enum
from collections import namedtuple
import numpy as np

pygame.init()
font = pygame.font.Font('arial.ttf', 25)

class Direction(Enum):
    RIGHT = 1
    LEFT = 2
    UP = 3
    DOWN = 4
    
Point = namedtuple('Point', 'x, y')

# rgb colors
WHITE = (255, 255, 255)
RED = (200,0,0)
BLUE1 = (0, 0, 255)
BLUE2 = (0, 100, 255)
BLACK = (0,0,0)

BLOCK_SIZE = 20
SPEED = 40

class SnakeGameAI:
    
    def __init__(self, w=640, h=480):
        self.w = w
        self.h = h
        # init display
        self.display = pygame.display.set_mode((self.w, self.h))
        pygame.display.set_caption('Snake')
        self.clock = pygame.time.Clock()
        self.reset()

    def reset(self):
        # init game state
        self.direction = Direction.RIGHT
        
        self.head = Point(self.w/2, self.h/2)
        self.snake = [self.head, 
                      Point(self.head.x-BLOCK_SIZE, self.head.y),
                      Point(self.head.x-(2*BLOCK_SIZE), self.head.y)]
        
        self.score = 0
        self.food = None
        self._place_food()
        self.frame_iteration = 0
        
    def _place_food(self):
        x = random.randint(0, (self.w-BLOCK_SIZE )//BLOCK_SIZE )*BLOCK_SIZE 
        y = random.randint(0, (self.h-BLOCK_SIZE )//BLOCK_SIZE )*BLOCK_SIZE
        self.food = Point(x, y)
        if self.food in self.snake:
            self._place_food()
        
    def play_step(self, action):
        self.frame_iteration += 1
        # 1. collect user input
        for event in pygame.event.get():
            if event.type == pygame.QUIT:
                pygame.quit()
                quit()
        
        # Store previous distance to food (for directional reward)
        prev_dist = abs(self.head.x - self.food.x) + abs(self.head.y - self.food.y)
        
        # 2. move
        self._move(action) # update the head
        self.snake.insert(0, self.head)
        
        # 3. check if game over
        reward = 0
        game_over = False
        # Prevents endless loops (bigger snake more moves allowed)
        if self.is_collision() or self.frame_iteration > 100*len(self.snake):
            game_over = True
            reward = -10
            return reward, game_over, self.score
        
        nearby_body = 0
        directions = [
            (-20, 0), (20, 0), (0, -20), (0, 20),  # Adjacent
            (-20, -20), (20, -20), (-20, 20), (20, 20)  # Diagonals
        ]

        for dx, dy in directions:
            if Point(self.head.x + dx, self.head.y + dy) in self.snake[2:]:
                nearby_body += 1
            
        # 4. place new food or just move
        if self.head == self.food:
            self.score += 1
            reward = 10
            self._place_food()
        else:
            self.snake.pop()

        # Directional rewards
        new_dist = abs(self.head.x - self.food.x) + abs(self.head.y - self.food.y)
        reward += 1.0 if new_dist < prev_dist else -0.5
        reward += -0.1 * nearby_body
        
        print(f"Move: {'Straight' if action[0] else 'Right' if action[1] else 'Left'} | " 
            f"Dist: {prev_dist:.1f}→{new_dist:.1f} | "
            f"Body blocks: {nearby_body} | Reward: {reward:.1f}")
        
        # 5. update ui and clock
        self._update_ui()
        self.clock.tick(SPEED)
        # 6. return game over and score
        return reward, game_over, self.score
    
    def is_collision(self, point = None):
        if point is None:
            point = self.head
        # hits boundary
        if point.x > self.w - BLOCK_SIZE or point.x < 0 or point.y > self.h - BLOCK_SIZE or point.y < 0:
            return True
        # hits itself
        if point in self.snake[1:]:
            return True
        
        return False
        
    def _update_ui(self):
        self.display.fill(BLACK)
        
        for pt in self.snake:
            pygame.draw.rect(self.display, BLUE1, pygame.Rect(pt.x, pt.y, BLOCK_SIZE, BLOCK_SIZE))
            pygame.draw.rect(self.display, BLUE2, pygame.Rect(pt.x+4, pt.y+4, 12, 12))
            
        pygame.draw.rect(self.display, RED, pygame.Rect(self.food.x, self.food.y, BLOCK_SIZE, BLOCK_SIZE))
        
        text = font.render("Score: " + str(self.score), True, WHITE)
        self.display.blit(text, [0, 0])
        pygame.display.flip()
        
    def _move(self, action):
        # [[1, 0, 0], [0, 1, 0], [0, 0, 1]] = [straight, right, left]

        clock_wise = [Direction.RIGHT, Direction.DOWN, Direction.LEFT, Direction.UP]
        index = clock_wise.index(self.direction)

        if np.array_equal(action, [1, 0, 0]):
            new_direction = clock_wise[index] # straight - no change
        elif np.array_equal(action, [0, 1, 0]):
            next_index = (index + 1) % 4
            new_direction = clock_wise[next_index] # turn right
        else: # [0, 0, 1] = left
            next_index = (index - 1) % 4
            new_direction = clock_wise[next_index] # turn left

        self.direction = new_direction

        x = self.head.x
        y = self.head.y
        if self.direction == Direction.RIGHT:
            x += BLOCK_SIZE
        elif self.direction == Direction.LEFT:
            x -= BLOCK_SIZE
        elif self.direction == Direction.DOWN:
            y += BLOCK_SIZE
        elif self.direction == Direction.UP:
            y -= BLOCK_SIZE
            
        self.head = Point(x, y)

### Resources:

https://www.youtube.com/watch?v=PJl4iabBEz0&list=PLfR10wejCzo_OL-6OsBV-4jAPnSncvZZH

https://en.wikipedia.org/wiki/Reinforcement_learning

https://www.geeksforgeeks.org/what-is-reinforcement-learning/

https://www.geeksforgeeks.org/snake-game-in-python-using-pygame-module/

https://pytorch.org/

https://www.pygame.org/docs/

https://docs.pytorch.org/docs/stable/tensors.html



And of course... **ChatGPT**

#### TODO:

Finish the documentation.

Explain how to import a trained model.

Като предавам да питам Данчо какво правим със самоуценката. Дали по-рано да се попълни или...?