d3rlpy.online.buffers.ReplayBuffer¶
- class d3rlpy.online.buffers.ReplayBuffer(maxlen, env=None, episodes=None, create_mask=False, mask_size=1)[source]¶
Standard Replay Buffer.
- Parameters
maxlen (int) – the maximum number of data length.
env (gym.Env) – gym-like environment to extract shape information.
episodes (list(d3rlpy.dataset.Episode)) – list of episodes to initialize buffer.
create_mask (bool) – flag to create bootstrapping mask.
mask_size (int) – ensemble size for binary mask.
Methods
- append(observation, action, reward, terminal, clip_episode=None)[source]¶
Append observation, action, reward and terminal flag to buffer.
If the terminal flag is True, Monte-Carlo returns will be computed with an entire episode and the whole transitions will be appended.
- Parameters
observation (numpy.ndarray) – observation.
action (numpy.ndarray) – action.
reward (float) – reward.
terminal (float) – terminal flag.
clip_episode (Optional[bool]) – flag to clip the current episode. If
None
, the episode is clipped based onterminal
.
- Return type
- append_episode(episode)¶
Append Episode object to buffer.
- Parameters
episode (d3rlpy.dataset.Episode) – episode.
- Return type
- sample(batch_size, n_frames=1, n_steps=1, gamma=0.99)¶
Returns sampled mini-batch of transitions.
If observation is image, you can stack arbitrary frames via
n_frames
.buffer.observation_shape == (3, 84, 84) # stack 4 frames batch = buffer.sample(batch_size=32, n_frames=4) batch.observations.shape == (32, 12, 84, 84)
- Parameters
- Returns
mini-batch.
- Return type
- size()¶
Returns the number of appended elements in buffer.
- Returns
the number of elements in buffer.
- Return type
- to_mdp_dataset()¶
Convert replay data into static dataset.
The length of the dataset can be longer than the length of the replay buffer because this conversion is done by tracing
Transition
objects.- Returns
MDPDataset object.
- Return type
Attributes
- transitions¶
Returns a FIFO queue of transitions.
- Returns
FIFO queue of transitions.
- Return type
d3rlpy.online.buffers.FIFOQueue