d3rlpy.online.buffers.ReplayBuffer

class d3rlpy.online.buffers.ReplayBuffer(maxlen, env, as_tensor=False, device=None)[source]

Standard Replay Buffer.

Parameters:
  • maxlen (int) – the maximum number of data length.
  • env (gym.Env) – gym-like environment to extract shape information.
  • as_tensor (bool) – flag to hold observations as torch.Tensor.
  • device (d3rlpy.gpu.Device or int) – gpu device or device id for tensor.
prev_observation

previously appended observation.

Type:numpy.ndarray
prev_action

previously appended action.

Type:numpy.ndarray or int
prev_reward

previously appended reward.

Type:float
prev_transition

previously appended transition.

Type:d3rlpy.dataset.Transition
transitions

list of transitions.

Type:collections.deque
observation_shape

observation shape.

Type:tuple
action_size

action size.

Type:int
as_tensor

flag to hold observations as torch.Tensor.

Type:bool
device

gpu device.

Type:d3rlpy.gpu.Device

Methods

__len__()[source]
append(observation, action, reward, terminal)[source]

Append observation, action, reward and terminal flag to buffer.

If the terminal flag is True, Monte-Carlo returns will be computed with an entire episode and the whole transitions will be appended.

Parameters:
sample(batch_size, n_frames=1)[source]

Returns sampled mini-batch of transitions.

If observation is image, you can stack arbitrary frames via n_frames.

buffer.observation_shape == (3, 84, 84)

# stack 4 frames
batch = buffer.sample(batch_size=32, n_frames=4)

batch.observations.shape == (32, 12, 84, 84)
Parameters:
  • batch_size (int) – mini-batch size.
  • n_frames (int) – the number of frames to stack for image observation.
Returns:

mini-batch.

Return type:

d3rlpy.dataset.TransitionMiniBatch

size()[source]

Returns the number of appended elements in buffer.

Returns:the number of elements in buffer.
Return type:int