- I decided to upload all my RL task demo here, include
- Project code written while learning
- Code written for some competition
- Anything about RL task... I don‘t know
- Anyway, I hope every project here is complete, and can be run or train directly to solve an individual RL task
- The project name will follow this format:
[env_name] task_name (method_name/method_type)
-
Project type: A recurrence project for an RL competition on JiDi AI platform
-
Raw Champion code: Luanshaotong/Competition_Olympics-Running
-
Detailed description: RL 实践(0)—— 及第平台辛丑年冬赛季【Rule-based policy】
-
Project type: Compare the performance of four simple ways to balance exploration and exploitation in K-arms bandit environment, include
-
$\epsilon$ -greedy - Decaying
$\epsilon$ -greedy - Upper confidence bound (UCB)
- Thompson sampling
note that K-arms bandit environment is a simplified version of RL paradigm without state transform
-
-
Detailed description: RL 实践(1)—— 多臂赌博机
-
Project type: Implement an example of 《Reinforcement Learning An Introduction》with GUI
-
Detailed description: RL 实践(2)—— 杰克租车问题【策略迭代 & 价值迭代】
-
Project type: Compare the performance of a series of tabular RL algorithm, include
- Sarsa
- Expected Sarsa
- N-step Sarsa
- N-step Tree Backup
- Q-Learning
- Double Q-Learning
-
Brief introduction: The experiment was conducted in a custom Cliff Walking environment based on gym. As you can see, there are two test files were written for each algorithm:
-
The code whose filename start with
RL_
are used to show the convergence process, there will be a env UI and the agent will be trained with a single random seed -
The code whose filename start with
Performance_
are used to record the performance of the algorithm, the agent will be trained with three different random seed, and the average return curve will be save in "data" folder as.npy
file. Once the curve data saved, you can runPerformance_compare.py
to load them and generate compare figure like
-
-
Detailed description: RL 实践(3)—— 悬崖漫步【QLearning & Sarsa & 各种变体】
-
Project type: Compare the performance of DQN, Double DQN and Dueling DQN
-
Brief introduction: The experiment was conducted in a custom Rolling-Ball environment based on gym. The Rolling-Ball environment is kind of like Maze2d environment. Imagine a rolling ball on a two-dimensional plane, applying horizontal and vertical forces to it, the ball will move under the action of acceleration. When the ball hits the edge of the plane, it will have a fully elastic collision, and we want the ball to reach the target position as soon as possible under the action of the force
-
Detailed description: RL 实践(4)—— 二维滚球环境【DQN & Double DQN & Dueling DQN】
- Project type: Compare the performance of REINFORCE and Actor-Critic, which are two simplest policy gradient RL methods
- Brief introduction: The experiment was conducted in a custom Rolling-Ball environment based on gym. The Rolling-Ball environment is kind of like Maze2d environment. Imagine a rolling ball on a two-dimensional plane, applying horizontal and vertical forces to it, the ball will move under the action of acceleration. When the ball hits the edge of the plane, it will have a fully elastic collision, and we want the ball to reach the target position as soon as possible under the action of the force
- Detailed description: RL 实践(5)—— 二维滚球环境【REINFORCE & Actor-Critic】
- Project type: Validate the advantages of the policy gradient with baseline over the original REINFORCE & Actor-Critic approach without baseline on CartPole-V0 env
- Detailed description: RL 实践(6)—— CartPole【REINFORCE with baseline & A2C】
- Project type: Test the PPO algorithm on the CartPole-V0 environment, which is one of the most popular Online RL methods
- Detailed description: RL 实践(7)—— CartPole【TPRO & PPO】
...to be continued