d3rlpy.online.buffers.BatchReplayBuffer¶
-
class
d3rlpy.online.buffers.
BatchReplayBuffer
(maxlen, env, episodes=None)[source]¶ Standard Replay Buffer for batch training.
- Parameters
maxlen (int) – the maximum number of data length.
n_envs (int) – the number of environments.
env (gym.Env) – gym-like environment to extract shape information.
episodes (list(d3rlpy.dataset.Episode)) – list of episodes to initialize buffer
Methods
-
append
(observations, actions, rewards, terminals, clip_episodes=None)[source]¶ Append observation, action, reward and terminal flag to buffer.
If the terminal flag is True, Monte-Carlo returns will be computed with an entire episode and the whole transitions will be appended.
- Parameters
observations (numpy.ndarray) – observation.
actions (numpy.ndarray) – action.
rewards (numpy.ndarray) – reward.
terminals (numpy.ndarray) – terminal flag.
clip_episodes (Optional[numpy.ndarray]) – flag to clip the current episode. If
None
, the episode is clipped based onterminal
.
- Return type
-
append_episode
(episode)¶ Append Episode object to buffer.
- Parameters
episode (d3rlpy.dataset.Episode) – episode.
- Return type
-
sample
(batch_size, n_frames=1, n_steps=1, gamma=0.99)¶ Returns sampled mini-batch of transitions.
If observation is image, you can stack arbitrary frames via
n_frames
.buffer.observation_shape == (3, 84, 84) # stack 4 frames batch = buffer.sample(batch_size=32, n_frames=4) batch.observations.shape == (32, 12, 84, 84)
- Parameters
- Returns
mini-batch.
- Return type
-
size
()¶ Returns the number of appended elements in buffer.
- Returns
the number of elements in buffer.
- Return type
-
to_mdp_dataset
()¶ Convert replay data into static dataset.
The length of the dataset can be longer than the length of the replay buffer because this conversion is done by tracing
Transition
objects.- Returns
MDPDataset object.
- Return type
Attributes
-
transitions
¶ Returns a FIFO queue of transitions.
- Returns
FIFO queue of transitions.
- Return type
d3rlpy.online.buffers.FIFOQueue