d3rlpy.metrics.scorer.discrete_action_match_scorer¶

d3rlpy.metrics.scorer.discrete_action_match_scorer(algo, episodes, window_size=1024)[source]¶

Returns percentage of identical actions between algorithm and dataset.

This metrics suggests how different the greedy-policy is from the given episodes in discrete action-space. If the given episdoes are near-optimal, the large percentage would be better.

\[\frac{1}{N} \sum^N \parallel \{a_t = \text{argmax}_a Q_\theta (s_t, a)\}\]

Parameters:	algo (d3rlpy.algos.base.AlgoBase) – algorithm. episodes (list(d3rlpy.dataset.Episode)) – list of episodes. window_size (int) – mini-batch size to compute.
Returns:	percentage of identical actions.
Return type:	float