Skip to content

wxc971231/RL_task_practice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About this repository

  • I decided to upload all my RL task demo here, include
    1. Project code written while learning
    2. Code written for some competition
    3. Anything about RL task... I don‘t know
  • Anyway, I hope every project here is complete, and can be run or train directly to solve an individual RL task
  • The project name will follow this format: [env_name] task_name (method_name/method_type)

Demo List

1. [JiDi_platform] competition-olympics-running (Rule-based)

2. [Handcraft Env] K-arms bandit (MC)

  • Project type: Compare the performance of four simple ways to balance exploration and exploitation in K-arms bandit environment, include

    1. $\epsilon$-greedy
    2. Decaying $\epsilon$-greedy
    3. Upper confidence bound (UCB)
    4. Thompson sampling

    note that K-arms bandit environment is a simplified version of RL paradigm without state transform

  • Detailed description: RL 实践(1)—— 多臂赌博机

3. [Handcraft Env] Jack's Car Rental (Policy Iteration & Value Iteration)

4_[Gym Custom] Cliff Walking (Q-Learning series and Sarsa series)

  • Project type: Compare the performance of a series of tabular RL algorithm, include

    1. Sarsa
    2. Expected Sarsa
    3. N-step Sarsa
    4. N-step Tree Backup
    5. Q-Learning
    6. Double Q-Learning
  • Brief introduction: The experiment was conducted in a custom Cliff Walking environment based on gym. As you can see, there are two test files were written for each algorithm:

    1. The code whose filename start with RL_ are used to show the convergence process, there will be a env UI and the agent will be trained with a single random seed

    2. The code whose filename start with Performance_ are used to record the performance of the algorithm, the agent will be trained with three different random seed, and the average return curve will be save in "data" folder as .npy file. Once the curve data saved, you can run Performance_compare.py to load them and generate compare figure like

  • Detailed description: RL 实践(3)—— 悬崖漫步【QLearning & Sarsa & 各种变体】

5_[Gym Custom] Rolling Ball (DQN & Double DQN & Dueling DQN)

  • Project type: Compare the performance of DQN, Double DQN and Dueling DQN

  • Brief introduction: The experiment was conducted in a custom Rolling-Ball environment based on gym. The Rolling-Ball environment is kind of like Maze2d environment. Imagine a rolling ball on a two-dimensional plane, applying horizontal and vertical forces to it, the ball will move under the action of acceleration. When the ball hits the edge of the plane, it will have a fully elastic collision, and we want the ball to reach the target position as soon as possible under the action of the force

  • Detailed description: RL 实践(4)—— 二维滚球环境【DQN & Double DQN & Dueling DQN】

6_[Gym Custom] Rolling Ball (REINFORCE and Actor-Critic)

  • Project type: Compare the performance of REINFORCE and Actor-Critic, which are two simplest policy gradient RL methods
  • Brief introduction: The experiment was conducted in a custom Rolling-Ball environment based on gym. The Rolling-Ball environment is kind of like Maze2d environment. Imagine a rolling ball on a two-dimensional plane, applying horizontal and vertical forces to it, the ball will move under the action of acceleration. When the ball hits the edge of the plane, it will have a fully elastic collision, and we want the ball to reach the target position as soon as possible under the action of the force
  • Detailed description: RL 实践(5)—— 二维滚球环境【REINFORCE & Actor-Critic】

7_[Gym] CartPole-V0 (REINFORCE with baseline and A2C)

8_[Gym] CartPole-V0 (PPO)

...to be continued

About

I decided to upload all my RL task demo here

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published