Deep RL Algorithms #4

GeorgeS2019 · 2023-06-13T10:15:37Z

Current status:

RL Algorithms

A2C
DQN

Model Free (TorchSharp)

QLearning

It learns the value of an action in a particular state. The algorithm uses a table to store the value of each state-action pair and updates the table based on the reward received by the agent. The goal of Q-learning is to find the optimal policy for an agent to take in an environment.
Cross Entropy
- It is a Monte Carlo method used to optimize the policy of an agent. It is used to calculate the difference between the predicted probability distribution and the actual probability distribution of actions taken by the agent.
- The cross-entropy method is a (Monte Carlo) stochastic optimization algorithm that can be used to solve optimization problems where the objective function is difficult to evaluate directly
Cross-Entropy Guided Policy (CGP) learning

It is a general Q-function and policy training method that can be combined with most deep Q-learning methods and demonstrates improved stability of training across runs, hyperparameter combinations, and tasks while avoiding the computational expense of a sample-based policy at inference

SB3 RL Algorithms comparisons

Also look into

BaseAlgorithm

Upon which the State of Art RL algorithms depend on.

RL Algorithms

These algorithms are classified into TWO groups

Policy or Non Policy, both inherited from the BaseAlgorithm

on_policy_algorithm.py

off_policy_algorithm.py

""" The base of RL algorithms :param policy: The policy model to use (MlpPolicy, CnnPolicy, ...) :param env: The environment to learn from (if registered in Gym, can be str. Can be None for loading trained models) :param learning_rate: learning rate for the optimizer, it can be a function of the current progress remaining (from 1 to 0) :param policy_kwargs: Additional arguments to be passed to the policy on creation :param stats_window_size: Window size for the rollout logging, specifying the number of episodes to average the reported success rate, mean episode length, and mean reward over :param tensorboard_log: the log location for tensorboard (if None, no logging) :param verbose: Verbosity level: 0 for no output, 1 for info messages (such as device or wrappers used), 2 for debug messages :param device: Device on which the code should run. By default, it will try to use a Cuda compatible device and fallback to cpu if it is not possible. :param support_multi_env: Whether the algorithm supports training with multiple environments (as in A2C) :param monitor_wrapper: When creating an environment, whether to wrap it or not in a Monitor wrapper. :param seed: Seed for the pseudo random generators :param use_sde: Whether to use generalized State Dependent Exploration (gSDE) instead of action noise exploration (default: False) :param sde_sample_freq: Sample a new noise matrix every n steps when using gSDE Default: -1 (only sample at the beginning of the rollout) :param supported_action_spaces: The action spaces supported by the algorithm. """

…Original Reinforce Learning. (Learn by one episode a time)

GeorgeS2019 mentioned this issue Jun 15, 2023

is there ppo example by torchsharp? dotnet/TorchSharp#981

Closed

xin-pu added a commit that referenced this issue Jul 6, 2023

add reinforce learning method (NN). Policy Based funcion.#4

a30f2c9

xin-pu added a commit that referenced this issue Jul 7, 2023

#4 Optimize Reinforce Learning ( Learn by atch Episodes a time), Add …

50f348c

…Original Reinforce Learning. (Learn by one episode a time)

xin-pu added a commit that referenced this issue Jul 7, 2023

#4 Complete ActorCritic Agent

b576afc

xin-pu added a commit that referenced this issue Jul 7, 2023

#4 Complete A2C, but AC still have issue.

82c9f6b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deep RL Algorithms #4

Deep RL Algorithms #4

GeorgeS2019 commented Jun 13, 2023 •

edited by xin-pu

Loading

RL Algorithms

Deep RL Algorithms #4

Deep RL Algorithms #4

Comments

GeorgeS2019 commented Jun 13, 2023 • edited by xin-pu Loading

RL Algorithms

Model Free (TorchSharp)

RL Algorithms

GeorgeS2019 commented Jun 13, 2023 •

edited by xin-pu

Loading