## What You Need to Implement in This Hackathon:

---



1. Your own **Bot class** that includes:
   - A `.act(player_info)` function that returns an action dictionary.
   - Example: `{"forward": True, "shoot": False, "rotate": 0}`

2. Your own **reward function**:
   - You can modify `calculate_reward(self, info_dictionary, bot_username)`
   - Use `bot_info` dictionary to understand what your agent did:
     - `damage_dealt`, `kills`, `location`, `health`, etc.

3. Optional: Tweak `frame_skip` to make training faster.


# How to use the provided code

In [1]:
from Environment import Env 

pygame 2.6.1 (SDL 2.28.4, Python 3.9.13)
Hello from the pygame community. https://www.pygame.org/contribute.html


Setup and use the environment

In [4]:
env = Env(training=True, # if set to false the game will play with a simple UI
          use_game_ui=False, # if set to True and training is set to false it will display the game with a advanced UI
          world_width=world_width, # do not change this
          world_height=world_height, # do not change this
          display_width=display_width, # do not change this
          display_height=display_height, # do not change this
          n_of_obstacles=n_of_obstacles, # passes the number of obstacles
          frame_skip=config["frame_skip"]) # number of frames to skip each step

NameError: name 'world_width' is not defined

Setup multiple characters competing

In [None]:
players = [
        Character(starting_pos=(world_bounds[2] - 100, world_bounds[3] - 100),
                  screen=env.world_surface,
                  boundaries=world_bounds,
                  username="Ninja"),
        Character(starting_pos=(world_bounds[0] + 10, world_bounds[1] + 10),
                  screen=env.world_surface,
                  boundaries=world_bounds,
                  username="Faze Jarvis"),
]

# in this case 2 characters are created starting in opposite corners

Each character can be controlled by a different bot

In [None]:
bots = []
    for _ in players:
        bot = MyBot(action_size=config["action_size"])
        bot.use_double_dqn = config["hyperparameters"]["double_dqn"]
        bot.learning_rate = config["hyperparameters"]["learning_rate"]
        bot.batch_size = config["hyperparameters"]["batch_size"]
        bot.gamma = config["hyperparameters"]["gamma"]
        bot.epsilon_decay = config["hyperparameters"]["epsilon_decay"]
        bot.optimizer = torch.optim.Adam(bot.model.parameters(), lr=bot.learning_rate)
        bots.append(bot)

    # --- link players and bots to environment ---
    env.set_players_bots_objects(players, bots)

Then set the rewards dictionary to hold all the data for each character



In [None]:
all_rewards = {player.username: [] for player in players}

# What methods CAN you modify
### In the Enviroment class
- `calculate_reward`

### In the My_bot class
- `All the code`

### In the main script
- `All the code`

# Main concept to understand
You are generally allowed to modify the code how you think it fits best during the training phase but at the final tournament the code that will be used is the one initially provided by US, the only code that will be used from your implementations are the ones contained in the bots modules. This ensures that all bots utilize the same game environment and information.

# Enviroment code

This environment is built using Pygame and provides a 2D simulation where bots can navigate, shoot, and interact. You'll be building a neural network that plays this game, and you'll design a reward function to guide its learning.

In [3]:
import math
import os
import pygame
from components.advanced_UI import game_UI
from components.world_gen import spawn_objects

class Env:
    def __init__(self, training=False, use_game_ui=True, world_width=1280, world_height=1280, display_width=640,
                 display_height=640, n_of_obstacles=10, frame_skip=4):
        pygame.init()

        self.training_mode = training

        ...

        # INIT SOME VARIABLES
        self.OG_bots = None
        self.OG_players = None
        self.OG_obstacles = None

        self.bots = None
        self.players = None
        self.obstacles = None

        """REWARD VARIABLES"""
        self.last_positions = {}
        self.last_damage = {}
        self.last_kills = {}
        self.last_health = {}
        self.visited_areas = {}

        self.visited_areas.clear()
        self.last_positions.clear()
        self.last_health.clear()
        self.last_kills.clear()
        self.last_damage.clear()

        self.steps = 0

    def set_players_bots_objects(self, players, bots, obstacles=None):
        ...
        # sets players and bot in the class and then resets

    def get_world_bounds(self):
        ...
        # returns (0, 0, self.world_width, self.world_height)

    def reset(self, randomize_objects=False, randomize_players=False):
        ...
        # resets the variables and the environment

    def step(self, debugging=False):

        # frame skipping for training acceleration
        skip_count = self.frame_skip if self.training_mode else 1

        # placeholder for the variables
        game_over = False
        final_info = None

        # get actions once and reuse them for all skipped frames
        player_actions = {}
        if self.training_mode:
            for player in self.players:
                if player.alive:
                    player_info = player.get_info()
                    player_info['closest_opponent'] = self.find_closest_opponent(player)
                    player_actions[player.username] = player.related_bot.act(player_info)

        # process multiple frames if frame skipping is enabled
        for _ in range(skip_count):
            if game_over:
                break

            self.steps += 1

            players_info = {}
            alive_players = []

            for player in self.players:
                ...
                # handles action and movement logic (not needed to modify)

                player_info = player.get_info()
                player_info["shot_fired"] = actions.get("shoot", False)
                player_info["closest_opponent"] = self.find_closest_opponent(player)
                players_info[player.username] = player_info

            new_dic = {
                "general_info": {
                    "total_players": len(self.players),
                    "alive_players": len(alive_players)
                },
                "players_info": players_info
            }

            final_info = new_dic

            if len(alive_players) == 1:
                ...
                # game over condition handling
                game_over = True
                break

        ...
        # rendering and display update code skipped in training mode

        if game_over:
            print("Total steps:", self.steps)
            return True, final_info
        else:
            return False, final_info

        # frame skipping for training acceleration
        skip_count = self.frame_skip if self.training_mode else 1

        # placeholder for the variables
        game_over = False
        final_info = None

        # get actions once and reuse them for all skipped frames
        player_actions = {}
        if self.training_mode:
            for player in self.players:
                if player.alive:
                    player_info = player.get_info()
                    player_info['closest_opponent'] = self.find_closest_opponent(player)
                    player_actions[player.username] = player.related_bot.act(player_info)

        # process multiple frames if frame skipping is enabled
        for _ in range(skip_count):
            if game_over:
                break

            self.steps += 1

            players_info = {}
            alive_players = []

            for player in self.players:
                player.update_tick()

                # use stored actions if in training mode with frame skipping
                if self.training_mode and skip_count > 1:
                    actions = player_actions.get(player.username, {})
                else:
                    # update info with closest opponent before getting action
                    player_info = player.get_info()
                    player_info['closest_opponent'] = self.find_closest_opponent(player)
                    actions = player.related_bot.act(player_info)

                if player.alive:
                    alive_players.append(player)
                    player.reload()

                    # skip drawing in training mode for better performance
                    if not self.training_mode:
                        player.draw(self.world_surface)

                    if debugging:
                        print("Bot would like to do:", actions)
                    if actions.get("forward", False):
                        player.move_in_direction("forward")
                    if actions.get("right", False):
                        player.move_in_direction("right")
                    if actions.get("down", False):
                        player.move_in_direction("down")
                    if actions.get("left", False):
                        player.move_in_direction("left")
                    if actions.get("rotate", 0):
                        player.add_rotate(actions["rotate"])
                    if actions.get("shoot", False):
                        player.shoot()

                    if not self.training_mode:
                        # store position for trail
                        if not hasattr(player, 'previous_positions'):
                            player.previous_positions = []
                        player.previous_positions.append(player.rect.center)
                        if len(player.previous_positions) > 10:
                            player.previous_positions.pop(0)

                player_info = player.get_info()
                player_info["shot_fired"] = actions.get("shoot", False)
                player_info["closest_opponent"] = self.find_closest_opponent(player)
                players_info[player.username] = player_info

            new_dic = {
                "general_info": {
                    "total_players": len(self.players),
                    "alive_players": len(alive_players)
                },
                "players_info": players_info
            }

            # store the final state
            final_info = new_dic

            # check if game is over
            if len(alive_players) == 1:
                print("Game Over, winner is:", alive_players[0].username)
                if not self.training_mode:
                    if self.use_advanced_UI:
                        self.advanced_UI.display_winner_screen(alive_players)
                    else:
                        self.screen.fill("green")

                game_over = True
                break

        # skip all rendering operations in training mode for better performance
        if not self.training_mode:
            if self.use_advanced_UI:
                self.advanced_UI.draw_everything(final_info, self.players, self.obstacles)
            else:
                # draw obstacles manually if not using advanced UI
                for obstacle in self.obstacles:
                    obstacle.draw(self.world_surface)

            # scale and display the world surface
            scaled_surface = pygame.transform.scale(self.world_surface, (self.display_width, self.display_height))
            self.screen.blit(scaled_surface, (0, 0))
            pygame.display.flip()

        # in training mode, use a high tick rate but not unreasonably high
        if not self.training_mode:
            self.clock.tick(120)  # normal gameplay speed
        else:
            # skip the clock tick entirely in training mode for maximum speed
            pass  # no tick limiting in training mode for maximum speed

        # return the final state
        if game_over:
            print("Total steps:", self.steps)
            return True, final_info  # Game is over
        else:
            # return the final state from the last frame
            return False, final_info

    def find_closest_opponent(self, player):
        ...
        # returns the closest enemy location for strategic decisions


    """TO MODIFY"""
    def calculate_reward_empty(self, info_dictionary, bot_username):
        """THIS FUNCTION IS USED TO CALCULATE THE REWARD FOR A BOT"""
        """NEEDS TO BE WRITTEN BY YOU TO FINE TUNE YOURS"""

        # retrieve the players' information from the dictionary
        players_info = info_dictionary.get("players_info", {})
        bot_info = players_info.get(bot_username)

        # if the bot is not found, return a default reward of 0
        if bot_info is None:
            print("Bot not found in the dictionary")
            return 0

        # extract variables from the bot's info
        location = bot_info.get("location", [0, 0])
        rotation = bot_info.get("rotation", 0)
        rays = bot_info.get("rays", [])
        current_ammo = bot_info.get("current_ammo", 0)
        alive = bot_info.get("alive", False)
        kills = bot_info.get("kills", 0)
        damage_dealt = bot_info.get("damage_dealt", 0)
        meters_moved = bot_info.get("meters_moved", 0)
        total_rotation = bot_info.get("total_rotation", 0)
        health = bot_info.get("health", 0)

        # calculate reward:
        reward = 0
        # add your reward calculation here

        # EXAMPLE
        # Damage taken penalty - encourage defensive play
        delta_health = self.last_health[bot_username] - health
        if delta_health > 0:
            reward -= delta_health * 0.2

        return reward

## Key Concepts

### Players and Bots:
- `players`: are the active agents in the environment.
- `bots`: are the AI controlling those players. The `act()` function is called each frame.

### Reward Function Tracking:
To calculate custom rewards, the environment stores, many variables, such as:
- `last_positions`: for measuring movement.
- `last_damage`, `last_kills`, `last_health`: to compute changes over time.
- `visited_areas`: encourages exploration.

the full list of variables can be found below

### But what is Frame Skipping:
- **Frame skipping** allows your model to **repeat the same action** for several frames.
- It speeds up training by reducing the number of decisions made.
- Set via `frame_skip` parameter.

## IMPORTANT NOTE
You can only use the variables given in the bot_info dictionary, modifing the contents of the dictionary will result in your bot being disqualified, you are supposed to use that data to generate your own mesurements.

# Full list of variables in the bot_info dictionary


Each bot receives a dictionary called `bot_info` at every step. This contains all the relevant data the environment collected about the bot, which can be used for decision-making and reward calculation.

| Key | Type | Description |
|-----|------|-------------|
| `location` | List `[x, y]` | The current position of the bot in the game world. |
| `rotation` | Float | The direction the bot is facing (in degrees or radians, depending on implementation). |
| `rays` | List | Distance readings from raycast sensors (used for detecting nearby obstacles or enemies). |
| `current_ammo` | Int | The current number of bullets or shots the bot has. |
| `alive` | Bool | Whether the bot is still alive (`True`) or has been eliminated (`False`). |
| `kills` | Int | How many opponents this bot has eliminated so far. |
| `damage_dealt` | Float | Total damage this bot has dealt to other players. |
| `meters_moved` | Float | The distance the bot has moved since the last step or over the whole game (depends on implementation). |
| `total_rotation` | Float | How much the bot has rotated over time – can be used to detect erratic spinning. |
| `health` | Int or Float | Current health level (usually 0 to 100). |
| `shot_fired` | Bool | Whether the bot tried to shoot in this step. |
| `closest_opponent` | List `[x, y]` | The position of the nearest living opponent. Useful for aiming or decision-making. |

You can use any combination of these values in your custom `calculate_reward()` function.

---
**Examples of what you might do with this:**
- Give positive reward for `damage_dealt` increase.
- Give a small penalty for `shot_fired` without any damage.
- Reward exploring new `location`s.
- Encourage killing (`kills`) or discourage unnecessary rotation (`total_rotation`).


# Bot example

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import random

class SimpleNN(nn.Module):
    def __init__(self, input_size, output_size):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, 64)
        self.fc2 = nn.Linear(64, 32)
        self.output = nn.Linear(32, output_size)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return torch.sigmoid(self.output(x))  # Output between 0 and 1

class MyBot:
    def __init__(self):
        self.model = SimpleNN(input_size=6, output_size=5)  # 6 inputs, 5 possible actions
        self.model.eval()  # We are not training here

    def act(self, player_info):
        # Create input vector
        x, y = player_info["location"]
        health = player_info["health"]
        ammo = player_info["current_ammo"]
        enemy_x, enemy_y = player_info["closest_opponent"]

        input_vector = torch.tensor([x, y, health, ammo, enemy_x, enemy_y], dtype=torch.float32)

        # Forward pass through the model
        with torch.no_grad():
            output = self.model(input_vector)

        # Convert output to boolean actions
        # Output nodes: [forward, left, right, rotate, shoot]
        actions = {
            "forward": output[0].item() > 0.5,
            "left": output[1].item() > 0.5,
            "right": output[2].item() > 0.5,
            "rotate": (output[3].item() - 0.5) * 2,  # range: [-1, 1]
            "shoot": output[4].item() > 0.5
        }

        return actions

## How This Works

1. The `SimpleNN` is a basic neural network with:
   - 6 input features (x, y, health, ammo, enemy_x, enemy_y)
   - 5 output actions: forward, left, right, rotate, shoot

2. The `MyBot` class:
   - Uses the NN to take `player_info` and compute actions.
   - Converts model outputs to booleans (e.g. shoot if value > 0.5).
   - Outputs a dictionary like `{"forward": True, "shoot": False}`.

3. You can **train** this model using reinforcement learning later by:
   - Storing `(state, action, reward, next_state)` tuples.
   - Applying an RL algorithm like DQN.