Skip to content

Latest commit

 

History

History
51 lines (35 loc) · 1.11 KB

off_policy_evaluation.rst

File metadata and controls

51 lines (35 loc) · 1.11 KB

Off-Policy Evaluation

d3rlpy.ope

The off-policy evaluation is a method to estimate the trained policy performance only with offline datasets.

import d3rlpy

# prepare the trained algorithm
cql = d3rlpy.load_learnable("model.d3")

# dataset to evaluate with
dataset, env = d3rlpy.datasets.get_pendulum()

# off-policy evaluation algorithm
fqe = d3rlpy.ope.FQE(algo=cql, config=d3rlpy.ope.FQEConfig())

# train estimators to evaluate the trained policy
fqe.fit(
   dataset,
   n_steps=100000,
   scorers={
      'init_value': d3rlpy.metrics.InitialStateValueEstimationEvaluator(),
      'soft_opc': d3rlpy.metrics.SoftOPCEvaluator(return_threshold=-300),
   },
)

The evaluation during fitting is evaluating the trained policy.

For continuous control algorithms

d3rlpy.ope.FQE

For discrete control algorithms

d3rlpy.ope.DiscreteFQE