d3rlpy.metrics.scorer.average_value_estimation_scorer¶
-
d3rlpy.metrics.scorer.
average_value_estimation_scorer
(algo, episodes)[source]¶ Returns average value estimation (in negative scale).
This metrics suggests the scale for estimation of Q functions. If average value estimation is too large, the Q functions overestimate action-values, which possibly makes training failed.
\[\mathbb{E}_{s_t \sim D} [ \max_a Q_\theta (s_t, a)]\]- Parameters
algo (d3rlpy.metrics.scorer.AlgoProtocol) – algorithm.
episodes (List[d3rlpy.dataset.Episode]) – list of episodes.
- Returns
negative average value estimation.
- Return type