d3rlpy.online.buffers.ReplayBuffer¶
-
class
d3rlpy.online.buffers.
ReplayBuffer
(maxlen, env, episodes=None)[source]¶ Standard Replay Buffer.
Parameters: - maxlen (int) – the maximum number of data length.
- env (gym.Env) – gym-like environment to extract shape information.
- episodes (list(d3rlpy.dataset.Episode)) – list of episodes to initialize buffer
-
prev_observation
¶ previously appended observation.
Type: numpy.ndarray
-
prev_action
¶ previously appended action.
Type: numpy.ndarray or int
-
prev_transition
¶ previously appended transition.
Type: d3rlpy.dataset.Transition
-
transitions
¶ list of transitions.
Type: collections.deque
Methods
-
append
(observation, action, reward, terminal)[source]¶ Append observation, action, reward and terminal flag to buffer.
If the terminal flag is True, Monte-Carlo returns will be computed with an entire episode and the whole transitions will be appended.
Parameters: - observation (numpy.ndarray) – observation.
- action (numpy.ndarray or int) – action.
- reward (float) – reward.
- terminal (bool or float) – terminal flag.
-
append_episode
(episode)[source]¶ Append Episode object to buffer.
Parameters: episode (d3rlpy.dataset.Episode) – episode.
-
sample
(batch_size, n_frames=1)[source]¶ Returns sampled mini-batch of transitions.
If observation is image, you can stack arbitrary frames via
n_frames
.buffer.observation_shape == (3, 84, 84) # stack 4 frames batch = buffer.sample(batch_size=32, n_frames=4) batch.observations.shape == (32, 12, 84, 84)
Parameters: Returns: mini-batch.
Return type: