d3rlpy.metrics.scorer.td_error_scorer

d3rlpy.metrics.scorer.td_error_scorer(algo, episodes)[source]

Returns average TD error (in negative scale).

This metics suggests how Q functions overfit to training sets. If the TD error is large, the Q functions are overfitting.

\[\mathbb{E}_{s_t, a_t, r_{t+1}, s_{t+1} \sim D} [Q_\theta (s_t, a_t) - (r_t + \gamma \max_a Q_\theta (s_{t+1}, a))^2]\]
Parameters
  • algo (d3rlpy.metrics.scorer.AlgoProtocol) – algorithm.

  • episodes (List[d3rlpy.dataset.Episode]) – list of episodes.

Returns

negative average TD error.

Return type

float