d3rlpy.online.buffers.BatchReplayBuffer¶

class d3rlpy.online.buffers.BatchReplayBuffer(maxlen, env, episodes=None)[source]¶

Standard Replay Buffer for batch training.

Parameters

maxlen (int) – the maximum number of data length.
n_envs (int) – the number of environments.
env (gym.Env) – gym-like environment to extract shape information.
episodes (list(d3rlpy.dataset.Episode)) – list of episodes to initialize buffer

Methods

__len__()¶

Return type: int

append(observations, actions, rewards, terminals, clip_episodes=None)[source]¶

Append observation, action, reward and terminal flag to buffer.

If the terminal flag is True, Monte-Carlo returns will be computed with an entire episode and the whole transitions will be appended.

Parameters

observations (numpy.ndarray) – observation.
actions (numpy.ndarray) – action.
rewards (numpy.ndarray) – reward.
terminals (numpy.ndarray) – terminal flag.
clip_episodes (Optional[numpy.ndarray]) – flag to clip the current episode. If None, the episode is clipped based on terminal.

Return type

None

append_episode(episode)¶

Append Episode object to buffer.

Parameters: episode (d3rlpy.dataset.Episode) – episode.
Return type: None

sample(batch_size, n_frames=1, n_steps=1, gamma=0.99)¶

Returns sampled mini-batch of transitions.

If observation is image, you can stack arbitrary frames via n_frames.

buffer.observation_shape == (3, 84, 84)

# stack 4 frames
batch = buffer.sample(batch_size=32, n_frames=4)

batch.observations.shape == (32, 12, 84, 84)

Parameters

batch_size (int) – mini-batch size.
n_frames (int) – the number of frames to stack for image observation.
n_steps (int) – the number of steps before the next observation.
gamma (float) – discount factor used in N-step return calculation.

Returns

mini-batch.

Return type

d3rlpy.dataset.TransitionMiniBatch

size()¶

Returns the number of appended elements in buffer.

Returns: the number of elements in buffer.
Return type: int

to_mdp_dataset()¶

Convert replay data into static dataset.

The length of the dataset can be longer than the length of the replay buffer because this conversion is done by tracing Transition objects.

Returns: MDPDataset object.
Return type: d3rlpy.dataset.MDPDataset

Attributes

transitions¶

Returns a FIFO queue of transitions.

Returns: FIFO queue of transitions.
Return type: d3rlpy.online.buffers.FIFOQueue