d3rlpy.metrics.scorer.dynamics_reward_prediction_error_scorer¶
-
d3rlpy.metrics.scorer.
dynamics_reward_prediction_error_scorer
(dynamics, episodes)[source]¶ Returns MSE of reward prediction (in negative scale).
This metrics suggests how dynamics model is generalized to test sets. If the MSE is large, the dynamics model are overfitting.
\[\mathbb{E}_{s_t, a_t, r_{t+1} \sim D} [(r_{t+1} - r')^2]\]where \(r' \sim T(s_t, a_t)\).
- Parameters
dynamics (d3rlpy.metrics.scorer.DynamicsProtocol) – dynamics model.
episodes (List[d3rlpy.dataset.Episode]) – list of episodes.
- Returns
negative mean squared error.
- Return type