d3rlpy.online.buffers.BatchReplayBuffer¶
-
class
d3rlpy.online.buffers.BatchReplayBuffer(maxlen, env, episodes=None, create_mask=False, mask_size=1)[source]¶ Standard Replay Buffer for batch training.
- Parameters
maxlen (int) – the maximum number of data length.
n_envs (int) – the number of environments.
env (gym.Env) – gym-like environment to extract shape information.
episodes (list(d3rlpy.dataset.Episode)) – list of episodes to initialize buffer
create_mask (bool) – flag to create bootstrapping mask.
mask_size (int) – ensemble size for binary mask.
Methods
-
append(observations, actions, rewards, terminals, clip_episodes=None)[source]¶ Append observation, action, reward and terminal flag to buffer.
If the terminal flag is True, Monte-Carlo returns will be computed with an entire episode and the whole transitions will be appended.
- Parameters
observations (numpy.ndarray) – observation.
actions (numpy.ndarray) – action.
rewards (numpy.ndarray) – reward.
terminals (numpy.ndarray) – terminal flag.
clip_episodes (Optional[numpy.ndarray]) – flag to clip the current episode. If
None, the episode is clipped based onterminal.
- Return type
-
append_episode(episode)¶ Append Episode object to buffer.
- Parameters
episode (d3rlpy.dataset.Episode) – episode.
- Return type
-
sample(batch_size, n_frames=1, n_steps=1, gamma=0.99)¶ Returns sampled mini-batch of transitions.
If observation is image, you can stack arbitrary frames via
n_frames.buffer.observation_shape == (3, 84, 84) # stack 4 frames batch = buffer.sample(batch_size=32, n_frames=4) batch.observations.shape == (32, 12, 84, 84)
- Parameters
- Returns
mini-batch.
- Return type
-
size()¶ Returns the number of appended elements in buffer.
- Returns
the number of elements in buffer.
- Return type
-
to_mdp_dataset()¶ Convert replay data into static dataset.
The length of the dataset can be longer than the length of the replay buffer because this conversion is done by tracing
Transitionobjects.- Returns
MDPDataset object.
- Return type
Attributes
-
transitions¶ Returns a FIFO queue of transitions.
- Returns
FIFO queue of transitions.
- Return type
d3rlpy.online.buffers.FIFOQueue