d3rlpy.dataset.ReplayBuffer¶
- class d3rlpy.dataset.ReplayBuffer(buffer, transition_picker=None, trajectory_slicer=None, writer_preprocessor=None, episodes=None, env=None, observation_signature=None, action_signature=None, reward_signature=None, action_space=None, action_size=None, cache_size=10000)[source]¶
Replay buffer for experience replay.
This replay buffer implementation is used for both online and offline training in d3rlpy. To determine shapes of observations, actions and rewards, one of
episodes
,env
and signatures must be provided.from d3rlpy.dataset import FIFOBuffer, ReplayBuffer, Signature buffer = FIFOBuffer(limit=1000000) # initialize with pre-collected episodes replay_buffer = ReplayBuffer(buffer=buffer, episodes=<episodes>) # initialize with Gym replay_buffer = ReplayBuffer(buffer=buffer, env=<env>) # initialize with manually specified signatures replay_buffer = ReplayBuffer( buffer=buffer, observation_signature=Signature(dtype=[<dtype>], shape=[<shape>]), action_signature=Signature(dtype=[<dtype>], shape=[<shape>]), reward_signature=Signature(dtype=[<dtype>], shape=[<shape>]), )
- Parameters
buffer (d3rlpy.dataset.BufferProtocol) – Buffer implementation.
transition_picker (Optional[d3rlpy.dataset.TransitionPickerProtocol]) – Transition picker implementation for Q-learning-based algorithms. If
None
is given,BasicTransitionPicker
is used by default.trajectory_slicer (Optional[d3rlpy.dataset.TrajectorySlicerProtocol]) – Trajectory slicer implementation for Transformer-based algorithms. If
None
is given,BasicTrajectorySlicer
is used by default.writer_preprocessor (Optional[d3rlpy.dataset.WriterPreprocessProtocol]) – Writer preprocessor implementation. If
None
is given,BasicWriterPreprocess
is used by default.episodes (Optional[Sequence[d3rlpy.dataset.EpisodeBase]]) – List of episodes to initialize replay buffer.
env (Optional[GymEnv]) – Gym environment to extract shapes of observations and action.
observation_signature (Optional[d3rlpy.dataset.Signature]) – Signature of observation.
action_signature (Optional[d3rlpy.dataset.Signature]) – Signature of action.
reward_signature (Optional[d3rlpy.dataset.Signature]) – Signature of reward.
action_space (Optional[d3rlpy.constants.ActionSpace]) – Action-space type.
action_size (Optional[int]) – Size of action-space. For continuous action-space, this represents dimension of action vectors. For discrete action-space, this represents the number of discrete actions.
cache_size (int) – Size of cache to record active episode history used for online training.
cache_size
needs to be greater than the maximum possible episode length.
Methods
- append(observation, action, reward)[source]¶
Appends observation, action and reward to buffer.
- Parameters
observation (Union[numpy.ndarray[Any, numpy.dtype[Any]], Sequence[numpy.ndarray[Any, numpy.dtype[Any]]]]) – Observation.
action (Union[int, numpy.ndarray[Any, numpy.dtype[Any]]]) – Action.
reward (Union[float, numpy.ndarray[Any, numpy.dtype[Any]]]) – Reward.
- Return type
- append_episode(episode)[source]¶
Appends episode to buffer.
- Parameters
episode (d3rlpy.dataset.components.EpisodeBase) – Episode.
- Return type
- dump(f)[source]¶
Dumps buffer data.
with open('dataset.h5', 'w+b') as f: replay_buffer.dump(f)
- Parameters
f (BinaryIO) – IO object to write to.
- Return type
- classmethod from_episode_generator(episode_generator, buffer, transition_picker=None, trajectory_slicer=None, writer_preprocessor=None)[source]¶
Builds ReplayBuffer from episode generator.
- Parameters
episode_generator (d3rlpy.dataset.episode_generator.EpisodeGeneratorProtocol) – Episode generator implementation.
buffer (d3rlpy.dataset.buffers.BufferProtocol) – Buffer implementation.
transition_picker (Optional[d3rlpy.dataset.transition_pickers.TransitionPickerProtocol]) – Transition picker implementation for Q-learning-based algorithms.
trajectory_slicer (Optional[d3rlpy.dataset.trajectory_slicers.TrajectorySlicerProtocol]) – Trajectory slicer implementation for Transformer-based algorithms.
writer_preprocessor (Optional[d3rlpy.dataset.writers.WriterPreprocessProtocol]) – Writer preprocessor implementation.
- Returns
Replay buffer.
- Return type
- classmethod load(f, buffer, episode_cls=<class 'd3rlpy.dataset.components.Episode'>, transition_picker=None, trajectory_slicer=None, writer_preprocessor=None)[source]¶
Builds ReplayBuffer from dumped data.
This method reconstructs replay buffer dumped by
dump
method.with open('dataset.h5', 'rb') as f: replay_buffer = ReplayBuffer.load(f, buffer)
- Parameters
f (BinaryIO) – IO object to read from.
buffer (d3rlpy.dataset.buffers.BufferProtocol) – Buffer implementation.
episode_cls (Type[d3rlpy.dataset.components.EpisodeBase]) – Eisode class used to reconstruct data.
transition_picker (Optional[d3rlpy.dataset.transition_pickers.TransitionPickerProtocol]) – Transition picker implementation for Q-learning-based algorithms.
trajectory_slicer (Optional[d3rlpy.dataset.trajectory_slicers.TrajectorySlicerProtocol]) – Trajectory slicer implementation for Transformer-based algorithms.
writer_preprocessor (Optional[d3rlpy.dataset.writers.WriterPreprocessProtocol]) – Writer preprocessor implementation.
- Returns
Replay buffer.
- Return type
- sample_trajectory(length)[source]¶
Samples a partial trajectory.
- Parameters
length (int) – Length of partial trajectory.
- Returns
Partial trajectory.
- Return type
d3rlpy.dataset.components.PartialTrajectory
- sample_transition()[source]¶
Samples a transition.
- Returns
Transition.
- Return type
d3rlpy.dataset.components.Transition
- sample_transition_batch(batch_size)[source]¶
Samples a mini-batch of transitions.
- Parameters
batch_size (int) – Mini-batch size.
- Returns
Mini-batch.
- Return type
d3rlpy.dataset.mini_batch.TransitionMiniBatch
Attributes
- buffer¶
Returns buffer.
- Returns
Buffer.
- dataset_info¶
Returns dataset information.
- Returns
Dataset information.
- episodes¶
Returns sequence of episodes.
- Returns
Sequence of episodes.
- trajectory_slicer¶
Returns trajectory slicer.
- Returns
Trajectory slicer.
- transition_count¶
Returns number of transitions.
- Returns
Number of transitions.
- transition_picker¶
Returns transition picker.
- Returns
Transition picker.