d3rlpy.metrics.ContinuousActionDiffEvaluator¶
- class d3rlpy.metrics.ContinuousActionDiffEvaluator(episodes=None)[source]¶
Returns squared difference of actions between algorithm and dataset.
This metric suggests how different the greedy-policy is from the given episodes in continuous action-space. If the given episodes are near-optimal, the small action difference would be better.
\[\mathbb{E}_{s_t, a_t \sim D} [(a_t - \pi_\phi (s_t))^2]\]- Parameters:
episodes – Optional evaluation episodes. If it’s not given, dataset used in training will be used.
Methods
- __call__(algo, dataset)[source]¶
Computes metrics.
- Parameters:
algo (QLearningAlgoProtocol) – Q-learning algorithm.
dataset (ReplayBufferBase) – ReplayBuffer.
- Returns:
Computed metrics.
- Return type: