d3rlpy.metrics.scorer.dynamics_reward_prediction_error_scorer

d3rlpy.metrics.scorer.dynamics_reward_prediction_error_scorer(dynamics, episodes, window_size=1024)[source]

Returns MSE of reward prediction (in negative scale).

This metrics suggests how dynamics model is generalized to test sets. If the MSE is large, the dynamics model are overfitting.

\[\mathbb{E}_{s_t, a_t, r_{t+1} \sim D} [(r_{t+1} - r')^2]\]

where \(r' \sim T(s_t, a_t)\).

Parameters:
  • dynamics (d3rlpy.dynamics.base.DynamicsBase) – dynamics model.
  • episodes (list(d3rlpy.dataset.Episode)) – list of episodes.
  • window_size (int) – mini-batch size to compute.
Returns:

negative mean squared error.

Return type:

float