d3rlpy.metrics.EnvironmentEvaluator

class d3rlpy.metrics.EnvironmentEvaluator(env, n_trials=10, epsilon=0.0)[source]

Action matches between algorithms.

This metric suggests how different the two algorithms are in discrete action-space. If the algorithm to compare with is near-optimal, the small action difference would be better.

\[\mathbb{E}_{s_t \sim D} [\parallel \{\text{argmax}_a Q_{\theta_1}(s_t, a) = \text{argmax}_a Q_{\theta_2}(s_t, a)\}]\]
Parameters:
  • env – Gym environment.

  • n_trials – Number of episodes to evaluate.

  • epsilon – Probability of random action.

Methods

__call__(algo, dataset)[source]

Computes metrics.

Parameters:
  • algo (QLearningAlgoProtocol) – Q-learning algorithm.

  • dataset (ReplayBufferBase) – ReplayBuffer.

Returns:

Computed metrics.

Return type:

float