d3rlpy.metrics.InitialStateValueEstimationEvaluator¶

class d3rlpy.metrics.InitialStateValueEstimationEvaluator(*args, **kwds)[source]¶

Returns mean estimated action-values at the initial states.

This metrics suggests how much return the trained policy would get from the initial states by deploying the policy to the states. If the estimated value is large, the trained policy is expected to get higher returns.

\[\mathbb{E}_{s_0 \sim D} [Q(s_0, \pi(s_0))]\]

References

Paine et al., Hyperparameter Selection for Offline Reinforcement Learning

Parameters: episodes – Optional evaluation episodes. If it’s not given, dataset used in training will be used.

Methods

__call__(algo, dataset)[source]¶

Computes metrics.

Parameters

algo (d3rlpy.interface.QLearningAlgoProtocol) – Q-learning algorithm.
dataset (d3rlpy.dataset.replay_buffer.ReplayBuffer) – ReplayBuffer.

Returns

Computed metrics.

Return type

float