d3rlpy.metrics.AverageValueEstimationEvaluator¶
- class d3rlpy.metrics.AverageValueEstimationEvaluator(episodes=None)[source]¶
Returns average value estimation.
This metric suggests the scale for estimation of Q functions. If average value estimation is too large, the Q functions overestimate action-values, which possibly makes training failed.
\[\mathbb{E}_{s_t \sim D} [ \max_a Q_\theta (s_t, a)]\]- Parameters:
episodes – Optional evaluation episodes. If it’s not given, dataset used in training will be used.
Methods
- __call__(algo, dataset)[source]¶
Computes metrics.
- Parameters:
algo (QLearningAlgoProtocol) – Q-learning algorithm.
dataset (ReplayBufferBase) – ReplayBuffer.
- Returns:
Computed metrics.
- Return type: