Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deep RL Algorithms #4

Open
4 of 5 tasks
GeorgeS2019 opened this issue Jun 13, 2023 · 0 comments
Open
4 of 5 tasks

Deep RL Algorithms #4

GeorgeS2019 opened this issue Jun 13, 2023 · 0 comments

Comments

@GeorgeS2019
Copy link

GeorgeS2019 commented Jun 13, 2023

Current status:

RL Algorithms

  • A2C
  • DQN

Model Free (TorchSharp)

  • QLearning

    It learns the value of an action in a particular state. The algorithm uses a table to store the value of each state-action pair and updates the table based on the reward received by the agent. The goal of Q-learning is to find the optimal policy for an agent to take in an environment.

  • Cross Entropy

    • It is a Monte Carlo method used to optimize the policy of an agent. It is used to calculate the difference between the predicted probability distribution and the actual probability distribution of actions taken by the agent.
    • The cross-entropy method is a (Monte Carlo) stochastic optimization algorithm that can be used to solve optimization problems where the objective function is difficult to evaluate directly
  • Cross-Entropy Guided Policy (CGP) learning

    It is a general Q-function and policy training method that can be combined with most deep Q-learning methods and demonstrates improved stability of training across runs, hyperparameter combinations, and tasks while avoiding the computational expense of a sample-based policy at inference

SB3 RL Algorithms comparisons

Also look into

BaseAlgorithm

Upon which the State of Art RL algorithms depend on.

RL Algorithms

image

These algorithms are classified into TWO groups

Policy or Non Policy, both inherited from the BaseAlgorithm

    """
    The base of RL algorithms

    :param policy: The policy model to use (MlpPolicy, CnnPolicy, ...)
    :param env: The environment to learn from
                (if registered in Gym, can be str. Can be None for loading trained models)
    :param learning_rate: learning rate for the optimizer,
        it can be a function of the current progress remaining (from 1 to 0)
    :param policy_kwargs: Additional arguments to be passed to the policy on creation
    :param stats_window_size: Window size for the rollout logging, specifying the number of episodes to average
        the reported success rate, mean episode length, and mean reward over
    :param tensorboard_log: the log location for tensorboard (if None, no logging)
    :param verbose: Verbosity level: 0 for no output, 1 for info messages (such as device or wrappers used), 2 for
        debug messages
    :param device: Device on which the code should run.
        By default, it will try to use a Cuda compatible device and fallback to cpu
        if it is not possible.
    :param support_multi_env: Whether the algorithm supports training
        with multiple environments (as in A2C)
    :param monitor_wrapper: When creating an environment, whether to wrap it
        or not in a Monitor wrapper.
    :param seed: Seed for the pseudo random generators
    :param use_sde: Whether to use generalized State Dependent Exploration (gSDE)
        instead of action noise exploration (default: False)
    :param sde_sample_freq: Sample a new noise matrix every n steps when using gSDE
        Default: -1 (only sample at the beginning of the rollout)
    :param supported_action_spaces: The action spaces supported by the algorithm.
    """
xin-pu added a commit that referenced this issue Jul 7, 2023
…Original Reinforce Learning. (Learn by one episode a time)
xin-pu added a commit that referenced this issue Jul 7, 2023
xin-pu added a commit that referenced this issue Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant