d3rlpy.metrics.AverageValueEstimationEvaluator

class d3rlpy.metrics.AverageValueEstimationEvaluator(episodes=None)[source]

Returns average value estimation.

This metric suggests the scale for estimation of Q functions. If average value estimation is too large, the Q functions overestimate action-values, which possibly makes training failed.

\[\mathbb{E}_{s_t \sim D} [ \max_a Q_\theta (s_t, a)]\]
Parameters:

episodes – Optional evaluation episodes. If it’s not given, dataset used in training will be used.

Methods

__call__(algo, dataset)[source]

Computes metrics.

Parameters:
  • algo (QLearningAlgoProtocol) – Q-learning algorithm.

  • dataset (ReplayBufferBase) – ReplayBuffer.

Returns:

Computed metrics.

Return type:

float