d3rlpy.online.buffers.ReplayBuffer

class d3rlpy.online.buffers.ReplayBuffer(maxlen, env, episodes=None)[source]

Standard Replay Buffer.

Parameters
  • maxlen (int) – the maximum number of data length.

  • env (gym.Env) – gym-like environment to extract shape information.

  • episodes (list(d3rlpy.dataset.Episode)) – list of episodes to initialize buffer

Methods

__len__()
Return type

int

append(observation, action, reward, terminal)[source]

Append observation, action, reward and terminal flag to buffer.

If the terminal flag is True, Monte-Carlo returns will be computed with an entire episode and the whole transitions will be appended.

Parameters
Return type

None

append_episode(episode)[source]

Append Episode object to buffer.

Parameters

episode (d3rlpy.dataset.Episode) – episode.

Return type

None

sample(batch_size, n_frames=1, n_steps=1, gamma=0.99)[source]

Returns sampled mini-batch of transitions.

If observation is image, you can stack arbitrary frames via n_frames.

buffer.observation_shape == (3, 84, 84)

# stack 4 frames
batch = buffer.sample(batch_size=32, n_frames=4)

batch.observations.shape == (32, 12, 84, 84)
Parameters
  • batch_size (int) – mini-batch size.

  • n_frames (int) – the number of frames to stack for image observation.

  • n_steps (int) – the number of steps before the next observation.

  • gamma (float) – discount factor used in N-step return calculation.

Returns

mini-batch.

Return type

d3rlpy.dataset.TransitionMiniBatch

size()[source]

Returns the number of appended elements in buffer.

Returns

the number of elements in buffer.

Return type

int