Reinforcement-Learning-Algorithms

These implementatios shows Convergence and performance of policy and value iteration algorithms, how the convergence of these algorithms to the optimal value function depends on the number of iterations used. Furthermore, I have implemented on-policy SARSA and off-policy Q-learning algorithms and showed how the performance of these algorithms depends on the exploration-exploitation tradeoff, and on learning rates. My experiments were evaluted on benchmark reinforcement learning tasks such as a smallworld, gridworld and a cliffworld MDP to analyze the performance of our algorithms.

Diclaimer: This is not an unique and original work.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Coursework Questions		Coursework Questions
Coursework Report and Code Explanation		Coursework Report and Code Explanation
README.md		README.md
cliffworld.m		cliffworld.m
gridworld.m		gridworld.m
initGridworld.m		initGridworld.m
plotVP.m		plotVP.m
policyIteration.m		policyIteration.m
qLearning.m		qLearning.m
rc2s.m		rc2s.m
s2rc.m		s2rc.m
sarsa.m		sarsa.m
smallworld.m		smallworld.m
valueIteration.m		valueIteration.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reinforcement-Learning-Algorithms

About

Uh oh!

Releases

Packages

Languages

tarek-ullah/Reinforcement-Learning-Algorithms

Folders and files

Latest commit

History

Repository files navigation

Reinforcement-Learning-Algorithms

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages