d3rlpy.metrics.TDErrorEvaluator¶
- class d3rlpy.metrics.TDErrorEvaluator(*args, **kwds)[source]¶
Returns average TD error.
This metics suggests how Q functions overfit to training sets. If the TD error is large, the Q functions are overfitting.
\[\mathbb{E}_{s_t, a_t, r_{t+1}, s_{t+1} \sim D} [(Q_\theta (s_t, a_t) - r_{t+1} - \gamma \max_a Q_\theta (s_{t+1}, a))^2]\]- Parameters
episodes – Optional evaluation episodes. If it’s not given, dataset used in training will be used.
Methods
- __call__(algo, dataset)[source]¶
Computes metrics.
- Parameters
algo (d3rlpy.interface.QLearningAlgoProtocol) – Q-learning algorithm.
dataset (d3rlpy.dataset.replay_buffer.ReplayBuffer) – ReplayBuffer.
- Returns
Computed metrics.
- Return type