d3rlpy.dataset.MDPDataset

class d3rlpy.dataset.MDPDataset(observations, actions, rewards, terminals, discrete_action=False, as_tensor=False, device=None)[source]

Markov-Decision Process Dataset class.

MDPDataset is deisnged for reinforcement learning datasets to use them like supervised learning datasets.

from d3rlpy.dataset import MDPDataset

# 1000 steps of observations with shape of (100,)
observations = np.random.random((1000, 100))
# 1000 steps of actions with shape of (4,)
actions = np.random.random((1000, 4))
# 1000 steps of rewards
rewards = np.random.random(1000)
# 1000 steps of terminal flags
terminals = np.random.randint(2, size=1000)

dataset = MDPDataset(observations, actions, rewards, terminals)

The MDPDataset object automatically splits the given data into list of d3rlpy.dataset.Episode objects. Furthermore, the MDPDataset object behaves like a list in order to use with scikit-learn utilities.

# returns the number of episodes
len(dataset)

# access to the first episode
episode = dataset[0]

# iterate through all episodes
for episode in dataset:
    pass
Parameters:
  • observations (numpy.ndarray or list(numpy.ndarray)) – N-D array. If the observation is a vector, the shape should be (N, dim_observation). If the observations is an image, the shape should be (N, C, H, W).
  • actions (numpy.ndarray) – N-D array. If the actions-space is continuous, the shape should be (N, dim_action). If the action-space is discrete, the shpae should be (N,).
  • rewards (numpy.ndarray) – array of scalar rewards.
  • terminals (numpy.ndarray) – array of binary terminal flags.
  • discrete_action (bool) – flag to use the given actions as discrete action-space actions.
  • as_tensor (bool) – flag to hold observations as torch.Tensor.
  • device (d3rlpy.gpu.Device or int) – gpu device or device id for tensors.

Methods

__getitem__(index)[source]
__len__()[source]
__iter__()[source]
append(observations, actions, rewards, terminals)[source]

Appends new data.

Parameters:
build_episodes()[source]

Builds episode objects.

This method will be internally called when accessing the episodes property at the first time.

clip_reward(low=None, high=None)[source]

Clips rewards in the given range.

Parameters:
  • low (float) – minimum value. If None, clipping is not performed on lower edge.
  • high (float) – maximum value. If None, clipping is not performed on upper edge.
compute_stats()[source]

Computes statistics of the dataset.

stats = dataset.compute_stats()

# return statistics
stats['return']['mean']
stats['return']['std']
stats['return']['min']
stats['return']['max']

# reward statistics
stats['reward']['mean']
stats['reward']['std']
stats['reward']['min']
stats['reward']['max']

# action (only with continuous control actions)
stats['action']['mean']
stats['action']['std']
stats['action']['min']
stats['action']['max']

# observation (only with numpy.ndarray observations)
stats['observation']['mean']
stats['observation']['std']
stats['observation']['min']
stats['observation']['max']
Returns:statistics of the dataset.
Return type:dict
dump(fname)[source]

Saves dataset as HDF5.

Parameters:fname (str) – file path.
extend(dataset)[source]

Extend dataset by another dataset.

Parameters:dataset (d3rlpy.dataset.MDPDataset) – dataset.
get_action_size()[source]

Returns dimension of action-space.

If discrete_action=True, the return value will be the maximum index +1 in the give actions.

Returns:dimension of action-space.
Return type:int
get_observation_shape()[source]

Returns observation shape.

Returns:observation shape.
Return type:tuple
is_action_discrete()[source]

Returns discrete_action flag.

Returns:discrete_action flag.
Return type:bool
classmethod load(fname, as_tensor=False, device=None)[source]

Loads dataset from HDF5.

import numpy as np
from d3rlpy.dataset import MDPDataset

dataset = MDPDataset(np.random.random(10, 4),
                     np.random.random(10, 2),
                     np.random.random(10),
                     np.random.randint(2, size=10))

# save as HDF5
dataset.dump('dataset.h5')

# load from HDF5
new_dataset = MDPDataset.load('dataset.h5')
Parameters:
  • fname (str) – file path.
  • as_tensor (bool) – flag to hold observations as torch.Tensor.
  • device (d3rlpy.gpu.Device or int) – gpu device or device id for tensor.
size()[source]

Returns the number of episodes in the dataset.

Returns:the number of episodes.
Return type:int

Attributes

actions

Returns the actions.

Returns:array of actions.
Return type:numpy.ndarray
as_tensor

Returns the flag to hold observations as torch.Tensor.

Returns:flag to hold observations as torch.Tensor.
Return type:bool
device

Returns the gpu device for tensors.

Returns:gpu device.
Return type:d3rlpy.gpu.Device
episodes

Returns the episodes.

Returns:list of d3rlpy.dataset.Episode objects.
Return type:list(d3rlpy.dataset.Episode)
observations

Returns the observations.

Returns:array of observations.
Return type:numpy.ndarray, list(numpy.ndarray) or torch.Tensor
rewards

Returns the rewards.

Returns:array of rewards
Return type:numpy.ndarray
terminals

Returns the terminal flags.

Returns:array of terminal flags.
Return type:numpy.ndarray