Metrics¶

d3rlpy provides scoring functions for offline Q-learning-based training. You can also check Logging to understand how to write metrics to files.

import d3rlpy

dataset, env = d3rlpy.datasets.get_cartpole()
# use partial episodes as test data
test_episodes = dataset.episodes[:10]

dqn = d3rlpy.algos.DQNConfig().create()

dqn.fit(
    dataset,
    n_steps=100000,
    evaluators={
        'td_error': d3rlpy.metrics.TDErrorEvaluator(test_episodes),
        'value_scale': d3rlpy.metrics.AverageValueEstimationEvaluator(test_episodes),
        'environment': d3rlpy.metrics.EnvironmentEvaluator(env),
    },
)

You can also implement your own metrics.

class CustomEvaluator(d3rlpy.metrics.EvaluatorProtocol):
    def __call__(self, algo: d3rlpy.algos.QLearningAlgoBase, dataset: ReplayBuffer) -> float:
        # do some evaluation

`d3rlpy.metrics.TDErrorEvaluator`	Returns average TD error.
`d3rlpy.metrics.DiscountedSumOfAdvantageEvaluator`	Returns average of discounted sum of advantage.
`d3rlpy.metrics.AverageValueEstimationEvaluator`	Returns average value estimation.
`d3rlpy.metrics.InitialStateValueEstimationEvaluator`	Returns mean estimated action-values at the initial states.
`d3rlpy.metrics.SoftOPCEvaluator`	Returns Soft Off-Policy Classification metrics.
`d3rlpy.metrics.ContinuousActionDiffEvaluator`	Returns squared difference of actions between algorithm and dataset.
`d3rlpy.metrics.DiscreteActionMatchEvaluator`	Returns percentage of identical actions between algorithm and dataset.
`d3rlpy.metrics.EnvironmentEvaluator`	Action matches between algorithms.
`d3rlpy.metrics.CompareContinuousActionDiffEvaluator`	Action difference between algorithms.
`d3rlpy.metrics.CompareDiscreteActionMatchEvaluator`	Action matches between algorithms.