d3rlpy.metrics.scorer.continuous_action_diff_scorer

d3rlpy.metrics.scorer.continuous_action_diff_scorer(algo, episodes, window_size=1024)[source]

Returns squared difference of actions between algorithm and dataset.

This metrics suggests how different the greedy-policy is from the given episodes in continuous action-space. If the given episodes are near-optimal, the small action difference would be better.

\[\mathbb{E}_{s_t, a_t \sim D} [(a_t - \pi_\phi (s_t))^2]\]
Parameters:
  • algo (d3rlpy.algos.base.AlgoBase) – algorithm.
  • episodes (list(d3rlpy.dataset.Episode)) – list of episodes.
  • window_size (int) – mini-batch size to compute.
Returns:

negative squared action difference.

Return type:

float