.. module:: d3rlpy.algos
d3rlpy provides state-of-the-art offline deep reinforcement learning algorithms as well as online algorithms for the base implementations.
Each algorithm provides its config class and you can instantiate it with specifying a device to use.
import d3rlpy
# instantiate algorithm with CPU
sac = d3rlpy.algos.SACConfig().create(device="cpu:0")
# instantiate algorithm with GPU
sac = d3rlpy.algos.SACConfig().create(device="cuda:0")
# instantiate algorithm with the 2nd GPU
sac = d3rlpy.algos.SACConfig().create(device="cuda:1")
You can also check advanced use cases at examples directory.
The base class of all algorithms.
.. autoclass:: d3rlpy.base.LearnableBase :members: :show-inheritance:
The base class of Q-learning algorithms.
.. autoclass:: d3rlpy.algos.QLearningAlgoBase :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.BCConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.BC :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DiscreteBCConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DiscreteBC :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.NFQConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.NFQ :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DQNConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DQN :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DoubleDQNConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DoubleDQN :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DDPGConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DDPG :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.TD3Config :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.TD3 :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.SACConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.SAC :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DiscreteSACConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DiscreteSAC :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.BCQConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.BCQ :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DiscreteBCQConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DiscreteBCQ :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.BEARConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.BEAR :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.CRRConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.CRR :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.CQLConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.CQL :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DiscreteCQLConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DiscreteCQL :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.AWACConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.AWAC :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.PLASConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.PLAS :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.PLASWithPerturbationConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.PLASWithPerturbation :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.TD3PlusBCConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.TD3PlusBC :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.IQLConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.IQL :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.RandomPolicyConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.RandomPolicy :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DiscreteRandomPolicyConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DiscreteRandomPolicy :members: :show-inheritance:
Decision Transformer-based algorithms usually require tricky interaction codes for evaluation.
In d3rlpy, those algorithms provide as_stateful_wrapper
method to easily integrate the algorithm into your system.
import d3rlpy
dataset, env = d3rlpy.datasets.get_pendulum()
dt = d3rlpy.algos.DecisionTransformerConfig().create(device="cuda:0")
# offline training
dt.fit(
dataset,
n_steps=100000,
n_steps_per_epoch=1000,
eval_env=env,
eval_target_return=0, # specify target environment return
)
# wrap as stateful actor for interaction
actor = dt.as_stateful_wrapper(target_return=0)
# interaction
observation, reward = env.reset(), 0.0
while True:
action = actor.predict(observation, reward)
observation, reward, done, truncated, _ = env.step(action)
if done or truncated:
break
# reset history
actor.reset()
.. autoclass:: d3rlpy.algos.TransformerAlgoBase :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DecisionTransformerConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DecisionTransformer :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DiscreteDecisionTransformerConfig :members: :show-inheritance:
.. autoclass:: d3rlpy.algos.DiscreteDecisionTransformer :members: :show-inheritance:
TransformerActionSampler
is an interface to sample actions from
DecisionTransformer outputs. Basically, the default action-sampler will be used
if you don't explicitly specify one.
import d3rlpy
dataset, env = d3rlpy.datasets.get_pendulum()
dt = d3rlpy.algos.DecisionTransformerConfig().create(device="cuda:0")
# offline training
dt.fit(
dataset,
n_steps=100000,
n_steps_per_epoch=1000,
eval_env=env,
eval_target_return=0,
# manually specify action-sampler
eval_action_sampler=d3rlpy.algos.IdentityTransformerActionSampler(),
)
# wrap as stateful actor for interaction with manually specified action-sampler
actor = dt.as_stateful_wrapper(
target_return=0,
action_sampler=d3rlpy.algos.IdentityTransformerActionSampler(),
)
.. autosummary:: :toctree: generated/ :nosignatures: d3rlpy.algos.TransformerActionSampler d3rlpy.algos.IdentityTransformerActionSampler( d3rlpy.algos.SoftmaxTransformerActionSampler d3rlpy.algos.GreedyTransformerActionSampler