5- Reinforcement Learning

History

Name		Name	Last commit message	Last commit date
parent directory ..
data		data
q_learning_output		q_learning_output
value_iteration_output		value_iteration_output
README.md		README.md
environment.py		environment.py
q_learning.py		q_learning.py
value_iteration.py		value_iteration.py

README.md

Maze Solver with Reinforcement Learning

Author: Stephen Tse <***@cmu.edu>

This project implements a maze solver with reinforcement learning. It supports both (model-based) value iteration learning and (model-less) Q-Learning.

Assumptions & RL Setups

A maze is rectangular with one and only one start state and one and only one goal state. We use “S” to indicate the initial state (start state), “G” to indicate the terminal state (goal state), ”*” to indicate an obstacle, and “.” to indicate a state an agent can go through. One can assume that there are no obstacles at “S” and “G”, and there are walls around the border of the maze.

Example:
```
.*..*..G
S......*
```
When the agent reaches the goal/terminal state, it'll stay there.
Whenever the agent performs an action (except when it stays in the goal/terminal state), it receives an immediate reward of -1.
In each state, the agent has 4 possible actions: moving west, north, east or south. However, when the agent takes an action that'll hit the border or an obstacle, it'll return to the original state, and it'll still need to pay -1 immediate reward.

Implementation Notes

As per assignment requirement, the learning rate \alpha in q-learning is set to a constant defined by users in the program. In practice, it's better to define it as 1 / number_of_times_you_visited_Q(s, a), so that \alpha approaches zero as number of Q visits grows.

Usage

You may use value_iteration.py for model-based learning, or use q_learning.py for model-less learning (in this case, the environment model is hidden from the learning algorithm: it will learn by interacting with the environment class, which only reports the next state and the reward the agent receives after making an action and whether it has arrived at the goal state).

python value_iteration.py <maze input> <value file> <q value file> <policy file> <num epoch> <discount factor>

<maze input>: path to the environment input .txt
<value file>: path to output the values V(s)
<q value file>: path to output the q-values Q(s; a)
<policy file>: path to output the optimal actions \pi(s)
<num epoch>: the number of epochs your program should train the agent for. Setting it to numbers <= 0 will allow the program to keep looping until convergence.
<discount factor>: the discount factor \lambda

python q_learning.py <maze input> <value file> <q value file> <policy file> <num episodes> <max episode length> <learning rate> <discount factor> <epsilon>

<maze input>: path to the environment input .txt
<value file>: path to output the values V(s)
<q value file>: path to output the q-values Q(s; a)
<policy file>: path to output the optimal actions \pi(s)
<num episodes>: the number of episodes your program should train the agent for.
<max episode length>: the maximum of the length of an episode. Setting it to numbers <= 0 will allow the agent to keep exploring until it reaches the goal state for the current episode.
<learning rate>: the learning rate \alpha of the q-learning algorithm
<discount factor>: the discount factor \lambda
<epsilon>: the value \epsilon for the epsilon-greedy strategy

Find the optimal policy for each state in <policy file> output. It's formatted as “x y optimal action” for each state s = (x, y) and the corresponding optimal policy \pi(s). The 4 possible actions: west, north, east and south is denoted by integer 0, 1, 2, 3 respectively.

Example

python value_iteration.py data/maze2.txt value_iteration_output/value_output.txt value_iteration_output/q_value_output.txt value_iteration_output/policy_output.txt -1 0.9

python q_learning.py data/maze2.txt q_learning_output/value_output.txt q_learning_output/q_value_output.txt q_learning_output/policy_output.txt 2000 -1 0.1 0.9 0.8

Language & Dependencies

Language: Python 3.6

Dependency Requirements: numpy (tested with 1.14.1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

5- Reinforcement Learning

5- Reinforcement Learning

README.md

Maze Solver with Reinforcement Learning

Assumptions & RL Setups

Implementation Notes

Usage

Example

Language & Dependencies

Files

5- Reinforcement Learning

Directory actions

More options

Directory actions

More options

Latest commit

History

5- Reinforcement Learning

Folders and files

parent directory

README.md

Maze Solver with Reinforcement Learning

Assumptions & RL Setups

Implementation Notes

Usage

Example

Language & Dependencies