d3rlpy.metrics.DiscreteActionMatchEvaluator¶
- class d3rlpy.metrics.DiscreteActionMatchEvaluator(*args, **kwds)[source]¶
Returns percentage of identical actions between algorithm and dataset.
This metrics suggests how different the greedy-policy is from the given episodes in discrete action-space. If the given episdoes are near-optimal, the large percentage would be better.
\[\frac{1}{N} \sum^N \parallel \{a_t = \text{argmax}_a Q_\theta (s_t, a)\}\]- Parameters
episodes – Optional evaluation episodes. If it’s not given, dataset used in training will be used.
Methods
- __call__(algo, dataset)[source]¶
Computes metrics.
- Parameters
algo (d3rlpy.interface.QLearningAlgoProtocol) – Q-learning algorithm.
dataset (d3rlpy.dataset.replay_buffer.ReplayBuffer) – ReplayBuffer.
- Returns
Computed metrics.
- Return type