d3rlpy.dataset.ReplayBuffer

class d3rlpy.dataset.ReplayBuffer(buffer, transition_picker=None, trajectory_slicer=None, writer_preprocessor=None, episodes=None, env=None, observation_signature=None, action_signature=None, reward_signature=None, action_space=None, action_size=None, cache_size=10000)[source]

Replay buffer for experience replay.

This replay buffer implementation is used for both online and offline training in d3rlpy. To determine shapes of observations, actions and rewards, one of episodes, env and signatures must be provided.

from d3rlpy.dataset import FIFOBuffer, ReplayBuffer, Signature

buffer = FIFOBuffer(limit=1000000)

# initialize with pre-collected episodes
replay_buffer = ReplayBuffer(buffer=buffer, episodes=<episodes>)

# initialize with Gym
replay_buffer = ReplayBuffer(buffer=buffer, env=<env>)

# initialize with manually specified signatures
replay_buffer = ReplayBuffer(
    buffer=buffer,
    observation_signature=Signature(dtype=[<dtype>], shape=[<shape>]),
    action_signature=Signature(dtype=[<dtype>], shape=[<shape>]),
    reward_signature=Signature(dtype=[<dtype>], shape=[<shape>]),
)
Parameters
  • buffer (d3rlpy.dataset.BufferProtocol) – Buffer implementation.

  • transition_picker (Optional[d3rlpy.dataset.TransitionPickerProtocol]) – Transition picker implementation for Q-learning-based algorithms. If None is given, BasicTransitionPicker is used by default.

  • trajectory_slicer (Optional[d3rlpy.dataset.TrajectorySlicerProtocol]) – Trajectory slicer implementation for Transformer-based algorithms. If None is given, BasicTrajectorySlicer is used by default.

  • writer_preprocessor (Optional[d3rlpy.dataset.WriterPreprocessProtocol]) – Writer preprocessor implementation. If None is given, BasicWriterPreprocess is used by default.

  • episodes (Optional[Sequence[d3rlpy.dataset.EpisodeBase]]) – List of episodes to initialize replay buffer.

  • env (Optional[GymEnv]) – Gym environment to extract shapes of observations and action.

  • observation_signature (Optional[d3rlpy.dataset.Signature]) – Signature of observation.

  • action_signature (Optional[d3rlpy.dataset.Signature]) – Signature of action.

  • reward_signature (Optional[d3rlpy.dataset.Signature]) – Signature of reward.

  • action_space (Optional[d3rlpy.constants.ActionSpace]) – Action-space type.

  • action_size (Optional[int]) – Size of action-space. For continuous action-space, this represents dimension of action vectors. For discrete action-space, this represents the number of discrete actions.

  • cache_size (int) – Size of cache to record active episode history used for online training. cache_size needs to be greater than the maximum possible episode length.

Methods

append(observation, action, reward)[source]

Appends observation, action and reward to buffer.

Parameters
Return type

None

append_episode(episode)[source]

Appends episode to buffer.

Parameters

episode (d3rlpy.dataset.components.EpisodeBase) – Episode.

Return type

None

clip_episode(terminated)[source]

Clips current episode.

Parameters

terminated (bool) – Flag to represent environmental termination. This flag should be False if the episode is terminated by timeout.

Return type

None

dump(f)[source]

Dumps buffer data.

with open('dataset.h5', 'w+b') as f:
    replay_buffer.dump(f)
Parameters

f (BinaryIO) – IO object to write to.

Return type

None

classmethod from_episode_generator(episode_generator, buffer, transition_picker=None, trajectory_slicer=None, writer_preprocessor=None)[source]

Builds ReplayBuffer from episode generator.

Parameters
Returns

Replay buffer.

Return type

d3rlpy.dataset.replay_buffer.ReplayBuffer

classmethod load(f, buffer, episode_cls=<class 'd3rlpy.dataset.components.Episode'>, transition_picker=None, trajectory_slicer=None, writer_preprocessor=None)[source]

Builds ReplayBuffer from dumped data.

This method reconstructs replay buffer dumped by dump method.

with open('dataset.h5', 'rb') as f:
    replay_buffer = ReplayBuffer.load(f, buffer)
Parameters
Returns

Replay buffer.

Return type

d3rlpy.dataset.replay_buffer.ReplayBuffer

sample_trajectory(length)[source]

Samples a partial trajectory.

Parameters

length (int) – Length of partial trajectory.

Returns

Partial trajectory.

Return type

d3rlpy.dataset.components.PartialTrajectory

sample_trajectory_batch(batch_size, length)[source]

Samples a mini-batch of partial trajectories.

Parameters
  • batch_size (int) – Mini-batch size.

  • length (int) – Length of partial trajectories.

Returns

Mini-batch.

Return type

d3rlpy.dataset.mini_batch.TrajectoryMiniBatch

sample_transition()[source]

Samples a transition.

Returns

Transition.

Return type

d3rlpy.dataset.components.Transition

sample_transition_batch(batch_size)[source]

Samples a mini-batch of transitions.

Parameters

batch_size (int) – Mini-batch size.

Returns

Mini-batch.

Return type

d3rlpy.dataset.mini_batch.TransitionMiniBatch

size()[source]

Returns number of episodes.

Returns

Number of episodes.

Return type

int

Attributes

buffer

Returns buffer.

Returns

Buffer.

dataset_info

Returns dataset information.

Returns

Dataset information.

episodes

Returns sequence of episodes.

Returns

Sequence of episodes.

trajectory_slicer

Returns trajectory slicer.

Returns

Trajectory slicer.

transition_count

Returns number of transitions.

Returns

Number of transitions.

transition_picker

Returns transition picker.

Returns

Transition picker.