d3rlpy.metrics.DiscreteActionMatchEvaluator

class d3rlpy.metrics.DiscreteActionMatchEvaluator(episodes=None)[source]

Returns percentage of identical actions between algorithm and dataset.

This metric suggests how different the greedy-policy is from the given episodes in discrete action-space. If the given episdoes are near-optimal, the large percentage would be better.

\[\frac{1}{N} \sum^N \parallel \{a_t = \text{argmax}_a Q_\theta (s_t, a)\}\]
Parameters:

episodes – Optional evaluation episodes. If it’s not given, dataset used in training will be used.

Methods

__call__(algo, dataset)[source]

Computes metrics.

Parameters:
  • algo (QLearningAlgoProtocol) – Q-learning algorithm.

  • dataset (ReplayBufferBase) – ReplayBuffer.

Returns:

Computed metrics.

Return type:

float