d3rlpy.dataset.SparseRewardTransitionPicker¶

class d3rlpy.dataset.SparseRewardTransitionPicker(failure_return, step_reward=0.0)[source]¶

Sparse reward transition picker.

This class extends BasicTransitionPicker to handle special returns_to_go calculation mainly used in AntMaze environments.

For the failure trajectories, this class sets the constant return value to avoid inconsistent horizon due to time out.

Parameters:

failure_return (int) – Return value for failure trajectories.
step_reward (float) – Immediate step reward value in sparse reward setting.

Methods

__call__(episode, index)[source]¶

Returns transition specified by index.

Parameters:

Returns:

Transition.

Return type:

Transition