d3rlpy.metrics.scorer.value_estimation_std_scorer¶
-
d3rlpy.metrics.scorer.
value_estimation_std_scorer
(algo, episodes, window_size=1024)[source]¶ Returns standard deviation of value estimation (in negative scale).
This metrics suggests how confident Q functions are for the given episodes. This metrics will be more accurate with boostrap enabled and the larger n_critics at algorithm. If standard deviation of value estimation is large, the Q functions are overfitting to the training set.
\[\mathbb{E}_{s_t \sim D, a \sim \text{argmax}_a Q_\theta(s_t, a)} [Q_{\text{std}}(s_t, a)]\]where \(Q_{\text{std}}(s, a)\) is a standard deviation of action-value estimation over ensemble functions.
Parameters: - algo (d3rlpy.algos.base.AlgoBase) – algorithm.
- episodes (list(d3rlpy.dataset.Episode)) – list of episodes.
- window_size (int) – mini-batch size to compute.