d3rlpy.dataset.MDPDataset¶
- class d3rlpy.dataset.MDPDataset(observations, actions, rewards, terminals, timeouts=None, transition_picker=None, trajectory_slicer=None, action_space=None, action_size=None)[source]¶
Backward-compability class of MDPDataset.
This is a wrapper class that has a backward-compatible constructor interface.
- Parameters:
observations (ObservationSequence) – Observations.
actions (np.ndarray) – Actions.
rewards (np.ndarray) – Rewards.
terminals (np.ndarray) – Environmental terminal flags.
timeouts (np.ndarray) – Timeouts.
transition_picker (Optional[TransitionPickerProtocol]) – Transition picker implementation for Q-learning-based algorithms. If
None
is given,BasicTransitionPicker
is used by default.trajectory_slicer (Optional[TrajectorySlicerProtocol]) – Trajectory slicer implementation for Transformer-based algorithms. If
None
is given,BasicTrajectorySlicer
is used by default.action_space (Optional[d3rlpy.constants.ActionSpace]) – Action-space type.
action_size (Optional[int]) – Size of action-space. For continuous action-space, this represents dimension of action vectors. For discrete action-space, this represents the number of discrete actions.
Methods
- append(observation, action, reward)¶
Appends observation, action and reward to buffer.
- append_episode(episode)¶
Appends episode to buffer.
- Parameters:
episode (EpisodeBase) – Episode.
- Return type:
None
- clip_episode(terminated)¶
Clips current episode.
- Parameters:
terminated (bool) – Flag to represent environmental termination. This flag should be
False
if the episode is terminated by timeout.- Return type:
None
- dump(f)¶
Dumps buffer data.
with open('dataset.h5', 'w+b') as f: replay_buffer.dump(f)
- Parameters:
f (BinaryIO) – IO object to write to.
- Return type:
None
- classmethod from_episode_generator(episode_generator, buffer, transition_picker=None, trajectory_slicer=None, writer_preprocessor=None)¶
Builds ReplayBuffer from episode generator.
- Parameters:
episode_generator (EpisodeGeneratorProtocol) – Episode generator implementation.
buffer (BufferProtocol) – Buffer implementation.
transition_picker (Optional[TransitionPickerProtocol]) – Transition picker implementation for Q-learning-based algorithms.
trajectory_slicer (Optional[TrajectorySlicerProtocol]) – Trajectory slicer implementation for Transformer-based algorithms.
writer_preprocessor (Optional[WriterPreprocessProtocol]) – Writer preprocessor implementation.
- Returns:
Replay buffer.
- Return type:
- classmethod load(f, buffer, episode_cls=<class 'd3rlpy.dataset.components.Episode'>, transition_picker=None, trajectory_slicer=None, writer_preprocessor=None)¶
Builds ReplayBuffer from dumped data.
This method reconstructs replay buffer dumped by
dump
method.with open('dataset.h5', 'rb') as f: replay_buffer = ReplayBuffer.load(f, buffer)
- Parameters:
f (BinaryIO) – IO object to read from.
buffer (BufferProtocol) – Buffer implementation.
episode_cls (Type[EpisodeBase]) – Eisode class used to reconstruct data.
transition_picker (Optional[TransitionPickerProtocol]) – Transition picker implementation for Q-learning-based algorithms.
trajectory_slicer (Optional[TrajectorySlicerProtocol]) – Trajectory slicer implementation for Transformer-based algorithms.
writer_preprocessor (Optional[WriterPreprocessProtocol]) – Writer preprocessor implementation.
- Returns:
Replay buffer.
- Return type:
- sample_trajectory(length)¶
Samples a partial trajectory.
- Parameters:
length (int) – Length of partial trajectory.
- Returns:
Partial trajectory.
- Return type:
PartialTrajectory
- sample_trajectory_batch(batch_size, length)¶
Samples a mini-batch of partial trajectories.
- sample_transition()¶
Samples a transition.
- Returns:
Transition.
- Return type:
Transition
- sample_transition_batch(batch_size)¶
Samples a mini-batch of transitions.
- Parameters:
batch_size (int) – Mini-batch size.
- Returns:
Mini-batch.
- Return type:
TransitionMiniBatch
Attributes
- buffer¶
- dataset_info¶
- episodes¶
- trajectory_slicer¶
- transition_count¶
- transition_picker¶