d3rlpy.metrics.scorer.average_value_estimation_scorer

d3rlpy.metrics.scorer.average_value_estimation_scorer(algo, episodes, window_size=1024)[source]

Returns average value estimation (in negative scale).

This metrics suggests the scale for estimation of Q functions. If average value estimation is too large, the Q functions overestimate action-values, which possibly makes training failed.

\[\mathbb{E}_{s_t \sim D} [ \max_a Q_\theta (s_t, a)]\]
Parameters:
  • algo (d3rlpy.algos.base.AlgoBase) – algorithm.
  • episodes (list(d3rlpy.dataset.Episode)) – list of episodes.
  • window_size (int) – mini-batch size to compute.
Returns:

negative average value estimation.

Return type:

float