Skip to content

A Minecraft Reinforcement Learning Agent solving a maze using Double Deep Q-Network (DDQN).

License

Notifications You must be signed in to change notification settings

tis22/Minecraft_RL

Repository files navigation

Minecraft Reinforcement Learning

This project implements Double Deep Q-Learning (DQN) with a replay memory and a neural network using PyTorch. The agent learns to navigate through a maze to a specified goal using visual input within a RL environment. Minecraft Malmo is used for the implementation, providing a platform to utilize Minecraft as a training environment for Reinforcement Learning. TensorBoard is employed to visualize the learning progress, tracking and displaying the agent's performance throughout the training process.

topdown

The goal of the agent is to find an optimal strategy to navigate through the maze without falling into lava, while passing through intermediate waypoints (sandstone) and completing the task by stepping on the diamond block in as few steps as possible.


Table of Contents

  1. Features
  2. Prerequisites
  3. Installation
  4. Usage
  5. Model
  6. Extras
  7. Next steps

Features

  • Train an Agent in Minecraft: Train an AI agent to navigate through a maze in Minecraft.
  • Real-Time training monitoring with TensorBoard: Visualize training progress and key metrics (like rewards, losses and steps).
  • Evaluate trained agent: Test a trained agent's performance, allowing you to see how well the agent performs.
  • Checkpoints: Save and load training checkpoints, so you can pause and resume training at any point.
  • Save Agent's frames: Optionally save each frame of the agent's actions during training, allowing you to create a video to showcase the agent's learning progress.
  • Create custom missions: Customize mazes through the mission XML configuration.

Prerequisites

  • Python 3.11 or higher
  • Java 8 JDK
  • Git 2.42.0 or higher

It is essential to have the correct version of Java installed: Java 8 JDK (AdoptOpenJDK).
For Windows systems, make sure to properly set the Path variable and for Linux, ensure that JAVA_HOME is configured correctly. Make sure Java is usable in the terminal by running java --version.
If necessary, uninstall any previous Java versions to ensure everything works as expected.


Installation

  1. Clone the repository:

    git clone https://github.com/tis22/Minecraft_RL.git
    cd Minecraft_RL
  2. Create a virtual environment and install the required dependencies from the requirements.txt file:

    python -m venv minecraft_rl
    # On Windows: .\minecraft_rl\Scripts\activate
    # On Linux: source minecraft_rl/bin/activate
    pip install -r requirements.txt

Installing the Minecraft Mod

Once the virtual environment is set up and the dependencies are installed, make sure you are in your user directory.

  • Windows: C:\Users\username
  • Linux: /home/username or ~/ (for the home directory)

You can complete the installation of the Minecraft Mod by running the following command:

python -c "import malmoenv.bootstrap; malmoenv.bootstrap.download()"

If you encounter any issues during installation of Minecraft MalmoEnv,
please refer to the official installation guide in the corresponding repository for more detailed instructions.

Copying Repository Files to MalmoEnv

After the Minecraft Mod has been successfully installed, a folder named MalmoPlatform should appear in the user directory:

  • Windows: C:\Users\username\MalmoPlatform
  • Linux: /home/username/MalmoPlatform or ~/MalmoPlatform (for the home directory)

Navigate to the MalmoEnv subfolder within MalmoPlatform.
In this MalmoEnv folder, you should now copy the contents of the previously cloned repository (Minecraft_RL).

If you don't see the MalmoPlatform folder or any other files, make sure to enable the display of hidden files:

  • Windows: In File Explorer, go to the "View" tab and check the "Hidden items" box.
  • Linux: Press Ctrl + H in your file manager to show hidden files.

Resolving Minecraft Resource Download Issues

It is common to encounter errors while downloading resources during Minecraft compilation, but these typically affect sound assets. Many of these sound assets can be manually downloaded as described below, which may speed up the process. However, the compilation should work without manual intervention as well.

  1. Ensure that the .gradle folder exists in your user directory.

  2. Download the required gradle_caches_minecraft.zip file.

  3. Rename or remove the existing Minecraft cache (if it exists):

    mv ~/.gradle/caches/minecraft ~/.gradle/caches/minecraft-org
  4. Extract the contents of the ZIP file into the .gradle/caches directory:

    unzip gradle_caches_minecraft.zip -d ~/.gradle/caches

Usage

Note: An active internet connection is required at the start of the compilation process for Minecraft Malmo (launching the environment).

Training the agent

If you want to train the agent, the program checks whether there is an existing training session (e.g. the runs, checkpoints and images directories should exist).

  • If the last checkpoint is found, training resumes from the next episode.
  • Otherwise, a new training session starts.

During training, data for TensorBoard is saved in the runs directory.

Additional Settings in the Code

  • To save individual frames (e.g. for creating a video afterwards), set the saveimagesteps variable to 1.
  • Parameters such as the number of episodes, maximum steps per episode, replay memory size and batch size can also be configured.
  • By default, a checkpoint is created every 1000 episodes.

Steps to Start Training

  1. Launch the environment in the first terminal:

    source minecraft_rl/bin/activate
    python -c "import malmoenv.bootstrap; malmoenv.bootstrap.launch_minecraft(9000)"

    This opens a new window displaying the environment, as defined in the mission XML (default: 84x84).
    Wait until the Minecraft Launcher window appears before proceeding.

  2. Start training in a second terminal:

    source minecraft_rl/bin/activate
    cd MalmoPlatform/MalmoEnv/
    python main.py --train

To stop training, press Ctrl + C.

Evaluating the trained agent

To evaluate a trained agent, use the --eval flag.

  • The program uses the trained model stored on the disk.
  • If no local model is found, it downloads the model from Google Drive.

Steps to Start Evaluation

  1. Launch Minecraft in the first terminal:

    source minecraft_rl/bin/activate
    python -c "import malmoenv.bootstrap; malmoenv.bootstrap.launch_minecraft(9000)"

    Wait until the Minecraft Launcher window appears before proceeding.

  2. Launch Minecraft in the second terminal:

    source minecraft_rl/bin/activate
    python -c "import malmoenv.bootstrap; malmoenv.bootstrap.launch_minecraft(9001)"

    Wait until the Minecraft Launcher window appears before proceeding.

  3. Start evaluation in the third terminal:

    source minecraft_rl/bin/activate
    cd MalmoPlatform/MalmoEnv/
    python main.py --eval

The evaluation mode will continue running until you press Enter to stop it.
The agent will repeatedly start from scratch to try and reach the goal.
While evaluating, you will see real-time information in the terminal, such as the agent’s current step, the action it took, the reward it received and the total accumulated reward up to that point.

Using TensorBoard

To visualize training progress or analyze a specific model's logs, use TensorBoard.

Note: If TensorBoard doesn't display the logs completely right after starting, it may take a moment to load all the training steps, especially if the logs are large. During this time, you might need to refresh the browser page a few times within the first minute until the logs appear correctly.

Options for TensorBoard

  1. View the latest/current TensorBoard logs:

    python main.py --tensorboard
  2. View specific TensorBoard logs:

    python main.py --tensorboard --logdir "path/to/logdir"
  3. Download TensorBoard logs for the model from Google Drive and view:

    python main.py --tensorboard --download

Model

DDQN: Convolutional Neural Network with the last 4 RGB frames (12 channels) CNN

The model approximates Q-values for each possible action based on the current state (current frame & last three frames) that the agent observes and selects actions accordingly. The actions the agent can take are: move forward, move backward, turn left and turn right, allowing it to navigate in all directions. The agent employs epsilon-greedy exploration and learns from past experiences stored in a replay memory.

Pre Trained model

Model after 120k Episodes

Logs after 120k Episodes


Extras

This project includes two Python scripts designed to efficiently handle large sets of images (potentially hundreds of thousands) and create a video from them.

episode99149


Next steps

  • Mixing Experiences: The agent will learn simultaneously on both parts of the maze. Memories from both halves will be combined to prevent overfitting to a subtask and improve generalization.
  • Adaptive Exploration: The epsilon value will be dynamically adjusted based on the agent's previous success. If the agent stagnates or develops suboptimal strategies during a certain phase, the epsilon value could be increased again to encourage further exploration and move the agent out of local optima.
  • Prioritized Experience Replay: Important experiences will have a higher chance of being replayed during training, strengthening the agent's ability to handle rare but critical situations.
  • Long Short-Term Memory (LSTM): This architecture will provide the agent with a better memory for long-term dependencies, enabling it to tackle more complex tasks effectively.

About

A Minecraft Reinforcement Learning Agent solving a maze using Double Deep Q-Network (DDQN).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages