Skip to content

Latest commit

 

History

History
371 lines (247 loc) · 5.94 KB

algos.rst

File metadata and controls

371 lines (247 loc) · 5.94 KB

Algorithms

d3rlpy.algos

d3rlpy provides state-of-the-art offline deep reinforcement learning algorithms as well as online algorithms for the base implementations.

Each algorithm provides its config class and you can instantiate it with specifying a device to use.

import d3rlpy

# instantiate algorithm with CPU
sac = d3rlpy.algos.SACConfig().create(device="cpu:0")
# instantiate algorithm with GPU
sac = d3rlpy.algos.SACConfig().create(device="cuda:0")
# instantiate algorithm with the 2nd GPU
sac = d3rlpy.algos.SACConfig().create(device="cuda:1")

You can also check advanced use cases at examples directory.

Base

LearnableBase

The base class of all algorithms.

d3rlpy.base.LearnableBase

Q-learning

QLearningAlgoBase

The base class of Q-learning algorithms.

d3rlpy.algos.QLearningAlgoBase

BC

d3rlpy.algos.BCConfig

d3rlpy.algos.BC

DiscreteBC

d3rlpy.algos.DiscreteBCConfig

d3rlpy.algos.DiscreteBC

NFQ

d3rlpy.algos.NFQConfig

d3rlpy.algos.NFQ

DQN

d3rlpy.algos.DQNConfig

d3rlpy.algos.DQN

DoubleDQN

d3rlpy.algos.DoubleDQNConfig

d3rlpy.algos.DoubleDQN

DDPG

d3rlpy.algos.DDPGConfig

d3rlpy.algos.DDPG

TD3

d3rlpy.algos.TD3Config

d3rlpy.algos.TD3

SAC

d3rlpy.algos.SACConfig

d3rlpy.algos.SAC

DiscreteSAC

d3rlpy.algos.DiscreteSACConfig

d3rlpy.algos.DiscreteSAC

BCQ

d3rlpy.algos.BCQConfig

d3rlpy.algos.BCQ

DiscreteBCQ

d3rlpy.algos.DiscreteBCQConfig

d3rlpy.algos.DiscreteBCQ

BEAR

d3rlpy.algos.BEARConfig

d3rlpy.algos.BEAR

CRR

d3rlpy.algos.CRRConfig

d3rlpy.algos.CRR

CQL

d3rlpy.algos.CQLConfig

d3rlpy.algos.CQL

DiscreteCQL

d3rlpy.algos.DiscreteCQLConfig

d3rlpy.algos.DiscreteCQL

AWAC

d3rlpy.algos.AWACConfig

d3rlpy.algos.AWAC

PLAS

d3rlpy.algos.PLASConfig

d3rlpy.algos.PLAS

PLAS+P

d3rlpy.algos.PLASWithPerturbationConfig

d3rlpy.algos.PLASWithPerturbation

TD3+BC

d3rlpy.algos.TD3PlusBCConfig

d3rlpy.algos.TD3PlusBC

IQL

d3rlpy.algos.IQLConfig

d3rlpy.algos.IQL

RandomPolicy

d3rlpy.algos.RandomPolicyConfig

d3rlpy.algos.RandomPolicy

DiscreteRandomPolicy

d3rlpy.algos.DiscreteRandomPolicyConfig

d3rlpy.algos.DiscreteRandomPolicy

Decision Transformer

Decision Transformer-based algorithms usually require tricky interaction codes for evaluation. In d3rlpy, those algorithms provide as_stateful_wrapper method to easily integrate the algorithm into your system.

import d3rlpy

dataset, env = d3rlpy.datasets.get_pendulum()

dt = d3rlpy.algos.DecisionTransformerConfig().create(device="cuda:0")

# offline training
dt.fit(
   dataset,
   n_steps=100000,
   n_steps_per_epoch=1000,
   eval_env=env,
   eval_target_return=0,  # specify target environment return
)

# wrap as stateful actor for interaction
actor = dt.as_stateful_wrapper(target_return=0)

# interaction
observation, reward = env.reset(), 0.0
while True:
    action = actor.predict(observation, reward)
    observation, reward, done, truncated, _ = env.step(action)
    if done or truncated:
        break

# reset history
actor.reset()

TransformerAlgoBase

d3rlpy.algos.TransformerAlgoBase

Decision Transformer

d3rlpy.algos.DecisionTransformerConfig

d3rlpy.algos.DecisionTransformer