d3rlpy.metrics.scorer.initial_state_value_estimation_scorer

d3rlpy.metrics.scorer.initial_state_value_estimation_scorer(algo, episodes)[source]

Returns mean estimated action-values at the initial states.

This metrics suggests how much return the trained policy would get from the initial states by deploying the policy to the states. If the estimated value is large, the trained policy is expected to get higher returns.

\[\mathbb{E}_{s_0 \sim D} [Q(s_0, \pi(s_0))]\]

References

Parameters
  • algo (d3rlpy.metrics.scorer.AlgoProtocol) – algorithm.

  • episodes (List[d3rlpy.dataset.Episode]) – list of episodes.

Returns

mean action-value estimation at the initial states.

Return type

float