MetricsΒΆ

d3rlpy provides scoring functions for offline Q-learning-based training. You can also check Logging to understand how to write metrics to files.

import d3rlpy

dataset, env = d3rlpy.datasets.get_cartpole()
# use partial episodes as test data
test_episodes = dataset.episodes[:10]

dqn = d3rlpy.algos.DQNConfig().create()

dqn.fit(
    dataset,
    n_steps=100000,
    evaluators={
        'td_error': d3rlpy.metrics.TDErrorEvaluator(test_episodes),
        'value_scale': d3rlpy.metrics.AverageValueEstimationEvaluator(test_episodes),
        'environment': d3rlpy.metrics.EnvironmentEvaluator(env),
    },
)

You can also implement your own metrics.

class CustomEvaluator(d3rlpy.metrics.EvaluatorProtocol):
    def __call__(self, algo: d3rlpy.algos.QLearningAlgoBase, dataset: ReplayBuffer) -> float:
        # do some evaluation

d3rlpy.metrics.TDErrorEvaluator

Returns average TD error.

d3rlpy.metrics.DiscountedSumOfAdvantageEvaluator

Returns average of discounted sum of advantage.

d3rlpy.metrics.AverageValueEstimationEvaluator

Returns average value estimation.

d3rlpy.metrics.InitialStateValueEstimationEvaluator

Returns mean estimated action-values at the initial states.

d3rlpy.metrics.SoftOPCEvaluator

Returns Soft Off-Policy Classification metrics.

d3rlpy.metrics.ContinuousActionDiffEvaluator

Returns squared difference of actions between algorithm and dataset.

d3rlpy.metrics.DiscreteActionMatchEvaluator

Returns percentage of identical actions between algorithm and dataset.

d3rlpy.metrics.EnvironmentEvaluator

Action matches between algorithms.

d3rlpy.metrics.CompareContinuousActionDiffEvaluator

Action difference between algorithms.

d3rlpy.metrics.CompareDiscreteActionMatchEvaluator

Action matches between algorithms.