d3rlpy.online.buffers.ReplayBuffer¶

class d3rlpy.online.buffers.ReplayBuffer(maxlen, env, episodes=None)[source]¶

Standard Replay Buffer.

Parameters:	maxlen (int) – the maximum number of data length. env (gym.Env) – gym-like environment to extract shape information. episodes (list(d3rlpy.dataset.Episode)) – list of episodes to initialize buffer

prev_observation¶

previously appended observation.

Type:	numpy.ndarray

prev_action¶

previously appended action.

Type:	numpy.ndarray or int

prev_reward¶

previously appended reward.

Type:	float

prev_transition¶

previously appended transition.

Type:	d3rlpy.dataset.Transition

transitions¶

list of transitions.

Type:	collections.deque

observation_shape¶

observation shape.

Type:	tuple

action_size¶

action size.

Type:	int

Methods

__len__()[source]¶

append(observation, action, reward, terminal)[source]¶

Append observation, action, reward and terminal flag to buffer.

If the terminal flag is True, Monte-Carlo returns will be computed with an entire episode and the whole transitions will be appended.

Parameters:	observation (numpy.ndarray) – observation. action (numpy.ndarray or int) – action. reward (float) – reward. terminal (bool or float) – terminal flag.

append_episode(episode)[source]¶

Append Episode object to buffer.

Parameters:	episode (d3rlpy.dataset.Episode) – episode.

sample(batch_size, n_frames=1)[source]¶

Returns sampled mini-batch of transitions.

If observation is image, you can stack arbitrary frames via n_frames.

buffer.observation_shape == (3, 84, 84)

# stack 4 frames
batch = buffer.sample(batch_size=32, n_frames=4)

batch.observations.shape == (32, 12, 84, 84)

Parameters:	batch_size (int) – mini-batch size. n_frames (int) – the number of frames to stack for image observation.
Returns:	mini-batch.
Return type:	d3rlpy.dataset.TransitionMiniBatch

size()[source]¶

Returns the number of appended elements in buffer.

Returns:	the number of elements in buffer.
Return type:	int