d3rlpy.metrics.scorer.continuous_action_diff_scorer¶

d3rlpy.metrics.scorer.continuous_action_diff_scorer(algo, episodes)[source]¶

Returns squared difference of actions between algorithm and dataset.

This metrics suggests how different the greedy-policy is from the given episodes in continuous action-space. If the given episodes are near-optimal, the small action difference would be better.