d3rlpy.dataset.SparseRewardTransitionPicker

class d3rlpy.dataset.SparseRewardTransitionPicker(failure_return, step_reward=0.0)[source]

Sparse reward transition picker.

This class extends BasicTransitionPicker to handle special returns_to_go calculation mainly used in AntMaze environments.

For the failure trajectories, this class sets the constant return value to avoid inconsistent horizon due to time out.

Parameters:
  • failure_return (int) – Return value for failure trajectories.

  • step_reward (float) – Immediate step reward value in sparse reward setting.

Methods

__call__(episode, index)[source]

Returns transition specified by index.

Parameters:
  • episode (EpisodeBase) – Episode.

  • index (int) – Index at the target transition.

Returns:

Transition.

Return type:

Transition