MetricsΒΆ
d3rlpy provides scoring functions for offline Q-learning-based training. You can also check Logging to understand how to write metrics to files.
import d3rlpy
dataset, env = d3rlpy.datasets.get_cartpole()
# use partial episodes as test data
test_episodes = dataset.episodes[:10]
dqn = d3rlpy.algos.DQNConfig().create()
dqn.fit(
dataset,
n_steps=100000,
evaluators={
'td_error': d3rlpy.metrics.TDErrorEvaluator(test_episodes),
'value_scale': d3rlpy.metrics.AverageValueEstimationEvaluator(test_episodes),
'environment': d3rlpy.metrics.EnvironmentEvaluator(env),
},
)
You can also implement your own metrics.
class CustomEvaluator(d3rlpy.metrics.EvaluatorProtocol):
def __call__(self, algo: d3rlpy.algos.QLearningAlgoBase, dataset: ReplayBuffer) -> float:
# do some evaluation
Returns average TD error. |
|
Returns average of discounted sum of advantage. |
|
Returns average value estimation. |
|
Returns mean estimated action-values at the initial states. |
|
Returns Soft Off-Policy Classification metrics. |
|
Returns squared difference of actions between algorithm and dataset. |
|
Returns percentage of identical actions between algorithm and dataset. |
|
Action matches between algorithms. |
|
Action difference between algorithms. |
|
Action matches between algorithms. |