d3rlpy.metrics.scorer.continuous_action_diff_scorer¶
-
d3rlpy.metrics.scorer.
continuous_action_diff_scorer
(algo, episodes)[source]¶ Returns squared difference of actions between algorithm and dataset.
This metrics suggests how different the greedy-policy is from the given episodes in continuous action-space. If the given episodes are near-optimal, the small action difference would be better.
\[\mathbb{E}_{s_t, a_t \sim D} [(a_t - \pi_\phi (s_t))^2]\]- Parameters
algo (d3rlpy.metrics.scorer.AlgoProtocol) – algorithm.
episodes (List[d3rlpy.dataset.Episode]) – list of episodes.
- Returns
negative squared action difference.
- Return type