d3rlpy.metrics.scorer.value_estimation_std_scorer

d3rlpy.metrics.scorer.value_estimation_std_scorer(algo, episodes)[source]

Returns standard deviation of value estimation (in negative scale).

This metrics suggests how confident Q functions are for the given episodes. This metrics will be more accurate with boostrap enabled and the larger n_critics at algorithm. If standard deviation of value estimation is large, the Q functions are overfitting to the training set.

\[\mathbb{E}_{s_t \sim D, a \sim \text{argmax}_a Q_\theta(s_t, a)} [Q_{\text{std}}(s_t, a)]\]

where \(Q_{\text{std}}(s, a)\) is a standard deviation of action-value estimation over ensemble functions.

Parameters
  • algo (d3rlpy.metrics.scorer.AlgoProtocol) – algorithm.

  • episodes (List[d3rlpy.dataset.Episode]) – list of episodes.

Returns

negative standard deviation.

Return type

float