d3rlpy.online.buffers.ReplayBuffer¶
-
class
d3rlpy.online.buffers.
ReplayBuffer
(maxlen, env=None, episodes=None)[source]¶ Standard Replay Buffer.
- Parameters
maxlen (int) – the maximum number of data length.
env (gym.Env) – gym-like environment to extract shape information.
episodes (list(d3rlpy.dataset.Episode)) – list of episodes to initialize buffer
Methods
-
append
(observation, action, reward, terminal, clip_episode=None)[source]¶ Append observation, action, reward and terminal flag to buffer.
If the terminal flag is True, Monte-Carlo returns will be computed with an entire episode and the whole transitions will be appended.
- Parameters
observation (numpy.ndarray) – observation.
action (numpy.ndarray) – action.
reward (float) – reward.
terminal (float) – terminal flag.
clip_episode (Optional[bool]) – flag to clip the current episode. If
None
, the episode is clipped based onterminal
.
- Return type
-
append_episode
(episode)¶ Append Episode object to buffer.
- Parameters
episode (d3rlpy.dataset.Episode) – episode.
- Return type
-
sample
(batch_size, n_frames=1, n_steps=1, gamma=0.99)¶ Returns sampled mini-batch of transitions.
If observation is image, you can stack arbitrary frames via
n_frames
.buffer.observation_shape == (3, 84, 84) # stack 4 frames batch = buffer.sample(batch_size=32, n_frames=4) batch.observations.shape == (32, 12, 84, 84)
- Parameters
- Returns
mini-batch.
- Return type
-
size
()¶ Returns the number of appended elements in buffer.
- Returns
the number of elements in buffer.
- Return type
-
to_mdp_dataset
()¶ Convert replay data into static dataset.
The length of the dataset can be longer than the length of the replay buffer because this conversion is done by tracing
Transition
objects.- Returns
MDPDataset object.
- Return type
Attributes
-
transitions
¶ Returns a FIFO queue of transitions.
- Returns
FIFO queue of transitions.
- Return type
d3rlpy.online.buffers.FIFOQueue