Off-Policy Evaluation

d3rlpy.ope

The off-policy evaluation is a method to estimate the trained policy performance only with offline datasets.

import d3rlpy

# prepare the trained algorithm
cql = d3rlpy.load_learnable("model.d3")

# dataset to evaluate with
dataset, env = d3rlpy.datasets.get_pendulum()

# off-policy evaluation algorithm
fqe = d3rlpy.ope.FQE(algo=cql, config=d3rlpy.ope.FQEConfig())

# train estimators to evaluate the trained policy
fqe.fit(
   dataset,
   n_steps=100000,
   scorers={
      'init_value': d3rlpy.metrics.InitialStateValueEstimationEvaluator(),
      'soft_opc': d3rlpy.metrics.SoftOPCEvaluator(return_threshold=-300),
   },
)

The evaluation during fitting is evaluating the trained policy.

For continuous control algorithms

d3rlpy.ope.FQE

For discrete control algorithms

d3rlpy.ope.DiscreteFQE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

off_policy_evaluation.rst

off_policy_evaluation.rst

Off-Policy Evaluation

For continuous control algorithms

For discrete control algorithms

Files

off_policy_evaluation.rst

Latest commit

History

off_policy_evaluation.rst

File metadata and controls

Off-Policy Evaluation

For continuous control algorithms

For discrete control algorithms