d3rlpy.metrics.EnvironmentEvaluator¶
- class d3rlpy.metrics.EnvironmentEvaluator(env, n_trials=10, epsilon=0.0)[source]¶
Action matches between algorithms.
This metric suggests how different the two algorithms are in discrete action-space. If the algorithm to compare with is near-optimal, the small action difference would be better.
\[\mathbb{E}_{s_t \sim D} [\parallel \{\text{argmax}_a Q_{\theta_1}(s_t, a) = \text{argmax}_a Q_{\theta_2}(s_t, a)\}]\]- Parameters:
env – Gym environment.
n_trials – Number of episodes to evaluate.
epsilon – Probability of random action.
Methods
- __call__(algo, dataset)[source]¶
Computes metrics.
- Parameters:
algo (QLearningAlgoProtocol) – Q-learning algorithm.
dataset (ReplayBufferBase) – ReplayBuffer.
- Returns:
Computed metrics.
- Return type: