d3rlpy.dataset.MDPDataset

class d3rlpy.dataset.MDPDataset(observations, actions, rewards, terminals, timeouts=None, transition_picker=None, trajectory_slicer=None, action_space=None, action_size=None)[source]

Backward-compability class of MDPDataset.

This is a wrapper class that has a backward-compatible constructor interface.

Parameters
  • observations (ObservationSequence) – Observations.

  • actions (np.ndarray) – Actions.

  • rewards (np.ndarray) – Rewards.

  • terminals (np.ndarray) – Environmental terminal flags.

  • timeouts (np.ndarray) – Timeouts.

  • transition_picker (Optional[TransitionPickerProtocol]) – Transition picker implementation for Q-learning-based algorithms. If None is given, BasicTransitionPicker is used by default.

  • trajectory_slicer (Optional[TrajectorySlicerProtocol]) – Trajectory slicer implementation for Transformer-based algorithms. If None is given, BasicTrajectorySlicer is used by default.

  • action_space (Optional[d3rlpy.constants.ActionSpace]) – Action-space type.

  • action_size (Optional[int]) – Size of action-space. For continuous action-space, this represents dimension of action vectors. For discrete action-space, this represents the number of discrete actions.

Methods

append(observation, action, reward)

Appends observation, action and reward to buffer.

Parameters
Return type

None

append_episode(episode)

Appends episode to buffer.

Parameters

episode (d3rlpy.dataset.components.EpisodeBase) – Episode.

Return type

None

clip_episode(terminated)

Clips current episode.

Parameters

terminated (bool) – Flag to represent environmental termination. This flag should be False if the episode is terminated by timeout.

Return type

None

dump(f)

Dumps buffer data.

with open('dataset.h5', 'w+b') as f:
    replay_buffer.dump(f)
Parameters

f (BinaryIO) – IO object to write to.

Return type

None

classmethod from_episode_generator(episode_generator, buffer, transition_picker=None, trajectory_slicer=None, writer_preprocessor=None)

Builds ReplayBuffer from episode generator.

Parameters
Returns

Replay buffer.

Return type

d3rlpy.dataset.replay_buffer.ReplayBuffer

classmethod load(f, buffer, episode_cls=<class 'd3rlpy.dataset.components.Episode'>, transition_picker=None, trajectory_slicer=None, writer_preprocessor=None)

Builds ReplayBuffer from dumped data.

This method reconstructs replay buffer dumped by dump method.

with open('dataset.h5', 'rb') as f:
    replay_buffer = ReplayBuffer.load(f, buffer)
Parameters
Returns

Replay buffer.

Return type

d3rlpy.dataset.replay_buffer.ReplayBuffer

sample_trajectory(length)

Samples a partial trajectory.

Parameters

length (int) – Length of partial trajectory.

Returns

Partial trajectory.

Return type

d3rlpy.dataset.components.PartialTrajectory

sample_trajectory_batch(batch_size, length)

Samples a mini-batch of partial trajectories.

Parameters
  • batch_size (int) – Mini-batch size.

  • length (int) – Length of partial trajectories.

Returns

Mini-batch.

Return type

d3rlpy.dataset.mini_batch.TrajectoryMiniBatch

sample_transition()

Samples a transition.

Returns

Transition.

Return type

d3rlpy.dataset.components.Transition

sample_transition_batch(batch_size)

Samples a mini-batch of transitions.

Parameters

batch_size (int) – Mini-batch size.

Returns

Mini-batch.

Return type

d3rlpy.dataset.mini_batch.TransitionMiniBatch

size()

Returns number of episodes.

Returns

Number of episodes.

Return type

int

Attributes

buffer

Returns buffer.

Returns

Buffer.

dataset_info

Returns dataset information.

Returns

Dataset information.

episodes

Returns sequence of episodes.

Returns

Sequence of episodes.

trajectory_slicer

Returns trajectory slicer.

Returns

Trajectory slicer.

transition_count

Returns number of transitions.

Returns

Number of transitions.

transition_picker

Returns transition picker.

Returns

Transition picker.