d3rlpy.dataset.MDPDataset

class d3rlpy.dataset.MDPDataset(observations, actions, rewards, terminals, episode_terminals=None, discrete_action=None, create_mask=False, mask_size=1)

Markov-Decision Process Dataset class.

MDPDataset is deisnged for reinforcement learning datasets to use them like supervised learning datasets.

from d3rlpy.dataset import MDPDataset

# 1000 steps of observations with shape of (100,)
observations = np.random.random((1000, 100))
# 1000 steps of actions with shape of (4,)
actions = np.random.random((1000, 4))
# 1000 steps of rewards
rewards = np.random.random(1000)
# 1000 steps of terminal flags
terminals = np.random.randint(2, size=1000)

dataset = MDPDataset(observations, actions, rewards, terminals)

The MDPDataset object automatically splits the given data into list of d3rlpy.dataset.Episode objects. Furthermore, the MDPDataset object behaves like a list in order to use with scikit-learn utilities.

# returns the number of episodes
len(dataset)

# access to the first episode
episode = dataset[0]

# iterate through all episodes
for episode in dataset:
    pass
Parameters
  • observations (numpy.ndarray) – N-D array. If the observation is a vector, the shape should be (N, dim_observation). If the observations is an image, the shape should be (N, C, H, W).

  • actions (numpy.ndarray) – N-D array. If the actions-space is continuous, the shape should be (N, dim_action). If the action-space is discrete, the shape should be (N,).

  • rewards (numpy.ndarray) – array of scalar rewards.

  • terminals (numpy.ndarray) – array of binary terminal flags.

  • episode_terminals (numpy.ndarray) – array of binary episode terminal flags. The given data will be splitted based on this flag. This is useful if you want to specify the non-environment terminations (e.g. timeout). If None, the episode terminations match the environment terminations.

  • discrete_action (bool) – flag to use the given actions as discrete action-space actions. If None, the action type is automatically determined.

  • create_mask (bool) – flag to create binary masks for bootstrapping.

  • mask_size (int) – ensemble size for mask. If create_mask is False, this will be ignored.

Methods

__getitem__(index)
__len__()
__iter__()
append(observations, actions, rewards, terminals, episode_terminals=None)

Appends new data.

Parameters
build_episodes()

Builds episode objects.

This method will be internally called when accessing the episodes property at the first time.

compute_stats()

Computes statistics of the dataset.

stats = dataset.compute_stats()

# return statistics
stats['return']['mean']
stats['return']['std']
stats['return']['min']
stats['return']['max']

# reward statistics
stats['reward']['mean']
stats['reward']['std']
stats['reward']['min']
stats['reward']['max']

# action (only with continuous control actions)
stats['action']['mean']
stats['action']['std']
stats['action']['min']
stats['action']['max']

# observation (only with numpy.ndarray observations)
stats['observation']['mean']
stats['observation']['std']
stats['observation']['min']
stats['observation']['max']
Returns

statistics of the dataset.

Return type

dict

dump(fname)

Saves dataset as HDF5.

Parameters

fname (str) – file path.

extend(dataset)

Extend dataset by another dataset.

Parameters

dataset (d3rlpy.dataset.MDPDataset) – dataset.

get_action_size()

Returns dimension of action-space.

If discrete_action=True, the return value will be the maximum index +1 in the give actions.

Returns

dimension of action-space.

Return type

int

get_observation_shape()

Returns observation shape.

Returns

observation shape.

Return type

tuple

is_action_discrete()

Returns discrete_action flag.

Returns

discrete_action flag.

Return type

bool

classmethod load(fname, create_mask=False, mask_size=1)

Loads dataset from HDF5.

import numpy as np
from d3rlpy.dataset import MDPDataset

dataset = MDPDataset(np.random.random(10, 4),
                     np.random.random(10, 2),
                     np.random.random(10),
                     np.random.randint(2, size=10))

# save as HDF5
dataset.dump('dataset.h5')

# load from HDF5
new_dataset = MDPDataset.load('dataset.h5')
Parameters
  • fname (str) – file path.

  • create_mask (bool) – flag to create bootstrapping masks.

  • mask_size (int) – size of bootstrapping masks.

size()

Returns the number of episodes in the dataset.

Returns

the number of episodes.

Return type

int

Attributes

actions

Returns the actions.

Returns

array of actions.

Return type

numpy.ndarray

episode_terminals

Returns the episode terminal flags.

Returns

array of episode terminal flags.

Return type

numpy.ndarray

episodes

Returns the episodes.

Returns

list of d3rlpy.dataset.Episode objects.

Return type

list(d3rlpy.dataset.Episode)

observations

Returns the observations.

Returns

array of observations.

Return type

numpy.ndarray

rewards

Returns the rewards.

Returns

array of rewards

Return type

numpy.ndarray

terminals

Returns the terminal flags.

Returns

array of terminal flags.

Return type

numpy.ndarray