d3rlpy.metrics.scorer.value_estimation_std_scorer¶
- d3rlpy.metrics.scorer.value_estimation_std_scorer(algo, episodes)[source]¶
Returns standard deviation of value estimation (in negative scale).
This metrics suggests how confident Q functions are for the given episodes. This metrics will be more accurate with boostrap enabled and the larger n_critics at algorithm. If standard deviation of value estimation is large, the Q functions are overfitting to the training set.
\[\mathbb{E}_{s_t \sim D, a \sim \text{argmax}_a Q_\theta(s_t, a)} [Q_{\text{std}}(s_t, a)]\]where \(Q_{\text{std}}(s, a)\) is a standard deviation of action-value estimation over ensemble functions.
- Parameters
algo (d3rlpy.metrics.scorer.AlgoProtocol) – algorithm.
episodes (List[d3rlpy.dataset.Episode]) – list of episodes.
- Returns
negative standard deviation.
- Return type