d3rlpy.metrics.CompareContinuousActionDiffEvaluator¶
- class d3rlpy.metrics.CompareContinuousActionDiffEvaluator(*args, **kwds)[source]¶
Action difference between algorithms.
This metrics suggests how different the two algorithms are in continuous action-space. If the algorithm to compare with is near-optimal, the small action difference would be better.
\[\mathbb{E}_{s_t \sim D} [(\pi_{\phi_1}(s_t) - \pi_{\phi_2}(s_t))^2]\]- Parameters
base_algo – Target algorithm to comapre with.
episodes – Optional evaluation episodes. If it’s not given, dataset used in training will be used.
Methods
- __call__(algo, dataset)[source]¶
Computes metrics.
- Parameters
algo (d3rlpy.interface.QLearningAlgoProtocol) – Q-learning algorithm.
dataset (d3rlpy.dataset.replay_buffer.ReplayBuffer) – ReplayBuffer.
- Returns
Computed metrics.
- Return type