d3rlpy.metrics.ContinuousActionDiffEvaluator¶

class d3rlpy.metrics.ContinuousActionDiffEvaluator(*args, **kwds)[source]¶

Returns squared difference of actions between algorithm and dataset.

This metrics suggests how different the greedy-policy is from the given episodes in continuous action-space. If the given episodes are near-optimal, the small action difference would be better.

\[\mathbb{E}_{s_t, a_t \sim D} [(a_t - \pi_\phi (s_t))^2]\]

Parameters: episodes – Optional evaluation episodes. If it’s not given, dataset used in training will be used.

Methods

__call__(algo, dataset)[source]¶

Computes metrics.

Parameters

algo (d3rlpy.interface.QLearningAlgoProtocol) – Q-learning algorithm.
dataset (d3rlpy.dataset.replay_buffer.ReplayBuffer) – ReplayBuffer.

Returns

Computed metrics.

Return type

float