d3rlpy.metrics.CompareContinuousActionDiffEvaluator

class d3rlpy.metrics.CompareContinuousActionDiffEvaluator(base_algo, episodes=None)[source]

Action difference between algorithms.

This metric suggests how different the two algorithms are in continuous action-space. If the algorithm to compare with is near-optimal, the small action difference would be better.

\[\mathbb{E}_{s_t \sim D} [(\pi_{\phi_1}(s_t) - \pi_{\phi_2}(s_t))^2]\]
Parameters:
  • base_algo – Target algorithm to comapre with.

  • episodes – Optional evaluation episodes. If it’s not given, dataset used in training will be used.

Methods

__call__(algo, dataset)[source]

Computes metrics.

Parameters:
  • algo (QLearningAlgoProtocol) – Q-learning algorithm.

  • dataset (ReplayBufferBase) – ReplayBuffer.

Returns:

Computed metrics.

Return type:

float