d3rlpy.metrics.scorer.value_estimation_std_scorer

d3rlpy.metrics.scorer.value_estimation_std_scorer(algo, episodes, window_size=1024)[source]

Returns standard deviation of value estimation (in negative scale).

This metrics suggests how confident Q functions are for the given episodes. This metrics will be more accurate with boostrap enabled and the larger n_critics at algorithm. If standard deviation of value estimation is large, the Q functions are overfitting to the training set.

\[\mathbb{E}_{s_t \sim D, a \sim \text{argmax}_a Q_\theta(s_t, a)} [Q_{\text{std}}(s_t, a)]\]

where \(Q_{\text{std}}(s, a)\) is a standard deviation of action-value estimation over ensemble functions.

Parameters:
  • algo (d3rlpy.algos.base.AlgoBase) – algorithm.
  • episodes (list(d3rlpy.dataset.Episode)) – list of episodes.
  • window_size (int) – mini-batch size to compute.