d3rlpy.metrics.scorer.average_value_estimation_scorer¶

d3rlpy.metrics.scorer.average_value_estimation_scorer(algo, episodes)[source]¶

Returns average value estimation (in negative scale).

This metrics suggests the scale for estimation of Q functions. If average value estimation is too large, the Q functions overestimate action-values, which possibly makes training failed.

\[\mathbb{E}_{s_t \sim D} [ \max_a Q_\theta (s_t, a)]\]

Parameters

algo (d3rlpy.metrics.scorer.AlgoProtocol) – algorithm.
episodes (List[d3rlpy.dataset.Episode]) – list of episodes.

Returns

negative average value estimation.

Return type

Read the Docs v: v0.61

Versions: latest; stable; v0.61; v0.60; v0.51; v0.50; v0.41; v0.40; v0.32; v0.31; v0.30; v0.23; v0.22; v0.21; v0.2; v0.1

Downloads: pdf; html; epub

On Read the Docs: Project Home; Builds