d3rlpy.algos
d3rlpy provides state-of-the-art offline deep reinforcement learning algorithms as well as online algorithms for the base implementations.
Each algorithm provides its config class and you can instantiate it with specifying a device to use.
import d3rlpy
# instantiate algorithm with CPU
sac = d3rlpy.algos.SACConfig().create(device="cpu:0")
# instantiate algorithm with GPU
sac = d3rlpy.algos.SACConfig().create(device="cuda:0")
# instantiate algorithm with the 2nd GPU
sac = d3rlpy.algos.SACConfig().create(device="cuda:1")
You can also check advanced use cases at examples directory.
The base class of all algorithms.
d3rlpy.base.LearnableBase
The base class of Q-learning algorithms.
d3rlpy.algos.QLearningAlgoBase
d3rlpy.algos.BCConfig
d3rlpy.algos.BC
d3rlpy.algos.DiscreteBCConfig
d3rlpy.algos.DiscreteBC
d3rlpy.algos.NFQConfig
d3rlpy.algos.NFQ
d3rlpy.algos.DQNConfig
d3rlpy.algos.DQN
d3rlpy.algos.DoubleDQNConfig
d3rlpy.algos.DoubleDQN
d3rlpy.algos.DDPGConfig
d3rlpy.algos.DDPG
d3rlpy.algos.TD3Config
d3rlpy.algos.TD3
d3rlpy.algos.SACConfig
d3rlpy.algos.SAC
d3rlpy.algos.DiscreteSACConfig
d3rlpy.algos.DiscreteSAC
d3rlpy.algos.BCQConfig
d3rlpy.algos.BCQ
d3rlpy.algos.DiscreteBCQConfig
d3rlpy.algos.DiscreteBCQ
d3rlpy.algos.BEARConfig
d3rlpy.algos.BEAR
d3rlpy.algos.CRRConfig
d3rlpy.algos.CRR
d3rlpy.algos.CQLConfig
d3rlpy.algos.CQL
d3rlpy.algos.DiscreteCQLConfig
d3rlpy.algos.DiscreteCQL
d3rlpy.algos.AWACConfig
d3rlpy.algos.AWAC
d3rlpy.algos.PLASConfig
d3rlpy.algos.PLAS
d3rlpy.algos.PLASWithPerturbationConfig
d3rlpy.algos.PLASWithPerturbation
d3rlpy.algos.TD3PlusBCConfig
d3rlpy.algos.TD3PlusBC
d3rlpy.algos.IQLConfig
d3rlpy.algos.IQL
d3rlpy.algos.RandomPolicyConfig
d3rlpy.algos.RandomPolicy
d3rlpy.algos.DiscreteRandomPolicyConfig
d3rlpy.algos.DiscreteRandomPolicy
Decision Transformer-based algorithms usually require tricky interaction codes for evaluation. In d3rlpy, those algorithms provide as_stateful_wrapper
method to easily integrate the algorithm into your system.
import d3rlpy
dataset, env = d3rlpy.datasets.get_pendulum()
dt = d3rlpy.algos.DecisionTransformerConfig().create(device="cuda:0")
# offline training
dt.fit(
dataset,
n_steps=100000,
n_steps_per_epoch=1000,
eval_env=env,
eval_target_return=0, # specify target environment return
)
# wrap as stateful actor for interaction
actor = dt.as_stateful_wrapper(target_return=0)
# interaction
observation, reward = env.reset(), 0.0
while True:
action = actor.predict(observation, reward)
observation, reward, done, truncated, _ = env.step(action)
if done or truncated:
break
# reset history
actor.reset()
d3rlpy.algos.TransformerAlgoBase
d3rlpy.algos.DecisionTransformerConfig
d3rlpy.algos.DecisionTransformer