Skip to content
This repository has been archived by the owner on Oct 9, 2022. It is now read-only.
/ pacman Public archive

👾 Reinforcement Learning (Fall 2021)

Notifications You must be signed in to change notification settings

yuhodots/pacman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pac-man Bot

This repository contains the implementation of a simple reinforcement learning task. So it's not exactly the same as the original Pac-man game. In this environment, the episode ends when the agent gets all the stars (unlike the original Pac-man game that the episode ends when all coins are removed).

imgimg

Algorithms

The algorithms in this repository focus on table-based classical reinforcement learning algorithms that do not use the deep neural network. Currently, the approximation version does not converge well, so it needs to be modified.

  1. Monte-Carlo Method
  2. SARSA
  3. Q-learning
  4. Double Q-learning
  5. Value Approximation
  6. REINFORCE

How To Run

You can execute code in scripts directory. The name of the script file means algorithm-environment.sh.

cd scripts/
sh ${runfile name} 	# Ex) sh mc-small.sh

The results of code execution can be found in the results directory or terminal window.

img

Requirements

You can check the virtual environments and required modules in environment.yaml and requirements.txt.

Environments

SmallGridEnv

In this environment, the ghost does not move. And since there is only one star, the episode ends when the agent arrives at the star. The figure below is a visualization of the environment through visualize_matrix(). Visualization is also possible through the env.render() function.

  • Observation space: 5 x 5 grid world - wall positon
    • observation_space.n = (25 - 5) = 20
  • Action space: { up: 0, down: 1, left: 2, right: 3 }
  • Reward: { ghost: -100, wall: -5, others: -1 }
    • Encountering a wall does not end the episode. The agent is just stationary as it is.

img

BigGridEnv

In this environment, the ghost randomly moves left and right. And since there are multiple stars, the episode ends when all the stars are obtained. The visualization method is the same as SmallGridEnv.

  • Observation space: (11 x 11 grid world - wall positon) x (Star state) x (Ghost position state)
    • observation_space.n = (121 - 40) * (2^4) * (3 * 7) = 27216
  • Action space: { up: 0, down: 1, left: 2, right: 3 }
  • Reward: { star: 100, ghost: -100, wall:-5 ,others: -1 }
    • Encountering a wall does not end the episode. The agent is just stationary as it is.

img

UnistEnv

There is no big difference from BigGridEnv. It's just that the wall position and the ghost position have changed. And the wall looks like UNIST 😙

img

References

About

👾 Reinforcement Learning (Fall 2021)

Resources

Stars

Watchers

Forks

Packages

No packages published