d3rlpy.metrics.scorer.average_value_estimation_scorer

d3rlpy.metrics.scorer.average_value_estimation_scorer(algo, episodes)[source]

Returns average value estimation (in negative scale).

This metrics suggests the scale for estimation of Q functions. If average value estimation is too large, the Q functions overestimate action-values, which possibly makes training failed.

\[\mathbb{E}_{s_t \sim D} [ \max_a Q_\theta (s_t, a)]\]
Parameters
  • algo (d3rlpy.metrics.scorer.AlgoProtocol) – algorithm.

  • episodes (List[d3rlpy.dataset.Episode]) – list of episodes.

Returns

negative average value estimation.

Return type

float