d3rlpy.metrics.scorer.td_error_scorer¶
-
d3rlpy.metrics.scorer.
td_error_scorer
(algo, episodes, window_size=1024)[source]¶ Returns average TD error (in negative scale).
This metics suggests how Q functions overfit to training sets. If the TD error is large, the Q functions are overfitting.
\[\mathbb{E}_{s_t, a_t, r_{t+1}, s_{t+1} \sim D} [Q_\theta (s_t, a_t) - (r_t + \gamma \max_a Q_\theta (s_{t+1}, a))^2]\]Parameters: - algo (d3rlpy.algos.base.AlgoBase) – algorithm.
- episodes (list(d3rlpy.dataset.Episode)) – list of episodes.
- window_size (int) – mini-batch size to compute.
Returns: negative average TD error.
Return type: