d3rlpy.online.buffers.BatchReplayBuffer

class d3rlpy.online.buffers.BatchReplayBuffer(maxlen, env, episodes=None, create_mask=False, mask_size=1)[source]

Standard Replay Buffer for batch training.

Parameters
  • maxlen (int) – the maximum number of data length.

  • n_envs (int) – the number of environments.

  • env (gym.Env) – gym-like environment to extract shape information.

  • episodes (list(d3rlpy.dataset.Episode)) – list of episodes to initialize buffer

  • create_mask (bool) – flag to create bootstrapping mask.

  • mask_size (int) – ensemble size for binary mask.

Methods

__len__()
Return type

int

append(observations, actions, rewards, terminals, clip_episodes=None)[source]

Append observation, action, reward and terminal flag to buffer.

If the terminal flag is True, Monte-Carlo returns will be computed with an entire episode and the whole transitions will be appended.

Parameters
Return type

None

append_episode(episode)

Append Episode object to buffer.

Parameters

episode (d3rlpy.dataset.Episode) – episode.

Return type

None

sample(batch_size, n_frames=1, n_steps=1, gamma=0.99)

Returns sampled mini-batch of transitions.

If observation is image, you can stack arbitrary frames via n_frames.

buffer.observation_shape == (3, 84, 84)

# stack 4 frames
batch = buffer.sample(batch_size=32, n_frames=4)

batch.observations.shape == (32, 12, 84, 84)
Parameters
  • batch_size (int) – mini-batch size.

  • n_frames (int) – the number of frames to stack for image observation.

  • n_steps (int) – the number of steps before the next observation.

  • gamma (float) – discount factor used in N-step return calculation.

Returns

mini-batch.

Return type

d3rlpy.dataset.TransitionMiniBatch

size()

Returns the number of appended elements in buffer.

Returns

the number of elements in buffer.

Return type

int

to_mdp_dataset()

Convert replay data into static dataset.

The length of the dataset can be longer than the length of the replay buffer because this conversion is done by tracing Transition objects.

Returns

MDPDataset object.

Return type

d3rlpy.dataset.MDPDataset

Attributes

transitions

Returns a FIFO queue of transitions.

Returns

FIFO queue of transitions.

Return type

d3rlpy.online.buffers.FIFOQueue