d3rlpy.dataset.MDPDataset¶

class d3rlpy.dataset.MDPDataset(observations, actions, rewards, terminals, discrete_action=False, as_tensor=False, device=None)[source]¶

Markov-Decision Process Dataset class.

MDPDataset is deisnged for reinforcement learning datasets to use them like supervised learning datasets.

from d3rlpy.dataset import MDPDataset

# 1000 steps of observations with shape of (100,)
observations = np.random.random((1000, 100))
# 1000 steps of actions with shape of (4,)
actions = np.random.random((1000, 4))
# 1000 steps of rewards
rewards = np.random.random(1000)
# 1000 steps of terminal flags
terminals = np.random.randint(2, size=1000)

dataset = MDPDataset(observations, actions, rewards, terminals)

The MDPDataset object automatically splits the given data into list of d3rlpy.dataset.Episode objects. Furthermore, the MDPDataset object behaves like a list in order to use with scikit-learn utilities.

# returns the number of episodes
len(dataset)

# access to the first episode
episode = dataset[0]

# iterate through all episodes
for episode in dataset:
    pass

Parameters:

observations (numpy.ndarray or list(numpy.ndarray)) – N-D array. If the observation is a vector, the shape should be (N, dim_observation). If the observations is an image, the shape should be (N, C, H, W).
actions (numpy.ndarray) – N-D array. If the actions-space is continuous, the shape should be (N, dim_action). If the action-space is discrete, the shpae should be (N,).
rewards (numpy.ndarray) – array of scalar rewards.
terminals (numpy.ndarray) – array of binary terminal flags.
discrete_action (bool) – flag to use the given actions as discrete action-space actions.
as_tensor (bool) – flag to hold observations as torch.Tensor.
device (d3rlpy.gpu.Device or int) – gpu device or device id for tensors.

Methods

__getitem__(index)[source]¶

__len__()[source]¶

__iter__()[source]¶

append(observations, actions, rewards, terminals)[source]¶

Appends new data.

Parameters:	observations (numpy.ndarray or list(numpy.ndarray)) – N-D array. actions (numpy.ndarray) – actions. rewards (numpy.ndarray) – rewards. terminals (numpy.ndarray) – terminals.

build_episodes()[source]¶

Builds episode objects.

This method will be internally called when accessing the episodes property at the first time.

clip_reward(low=None, high=None)[source]¶

Clips rewards in the given range.

Parameters:	low (float) – minimum value. If None, clipping is not performed on lower edge. high (float) – maximum value. If None, clipping is not performed on upper edge.

compute_stats()[source]¶

Computes statistics of the dataset.

stats = dataset.compute_stats()

# return statistics
stats['return']['mean']
stats['return']['std']
stats['return']['min']
stats['return']['max']

# reward statistics
stats['reward']['mean']
stats['reward']['std']
stats['reward']['min']
stats['reward']['max']

# action (only with continuous control actions)
stats['action']['mean']
stats['action']['std']
stats['action']['min']
stats['action']['max']

# observation (only with numpy.ndarray observations)
stats['observation']['mean']
stats['observation']['std']
stats['observation']['min']
stats['observation']['max']

Returns:	statistics of the dataset.
Return type:	dict

dump(fname)[source]¶

Saves dataset as HDF5.

Parameters:	fname (str) – file path.

extend(dataset)[source]¶

Extend dataset by another dataset.

Parameters:	dataset (d3rlpy.dataset.MDPDataset) – dataset.

get_action_size()[source]¶

Returns dimension of action-space.

If discrete_action=True, the return value will be the maximum index +1 in the give actions.

Returns:	dimension of action-space.
Return type:	int

get_observation_shape()[source]¶

Returns observation shape.

Returns:	observation shape.
Return type:	tuple

is_action_discrete()[source]¶

Returns discrete_action flag.

Returns:	discrete_action flag.
Return type:	bool

classmethod load(fname, as_tensor=False, device=None)[source]¶

Loads dataset from HDF5.

import numpy as np
from d3rlpy.dataset import MDPDataset

dataset = MDPDataset(np.random.random(10, 4),
                     np.random.random(10, 2),
                     np.random.random(10),
                     np.random.randint(2, size=10))

# save as HDF5
dataset.dump('dataset.h5')

# load from HDF5
new_dataset = MDPDataset.load('dataset.h5')

Parameters:	fname (str) – file path. as_tensor (bool) – flag to hold observations as `torch.Tensor`. device (d3rlpy.gpu.Device or int) – gpu device or device id for tensor.

size()[source]¶

Returns the number of episodes in the dataset.

Returns:	the number of episodes.
Return type:	int

Attributes

actions¶

Returns the actions.

Returns:	array of actions.
Return type:	numpy.ndarray

as_tensor¶

Returns the flag to hold observations as torch.Tensor.

Returns:	flag to hold observations as `torch.Tensor`.
Return type:	bool

device¶

Returns the gpu device for tensors.

Returns:	gpu device.
Return type:	d3rlpy.gpu.Device

episodes¶

Returns the episodes.

Returns:	list of `d3rlpy.dataset.Episode` objects.
Return type:	list(d3rlpy.dataset.Episode)

observations¶

Returns the observations.

Returns:	array of observations.
Return type:	numpy.ndarray, list(numpy.ndarray) or torch.Tensor

rewards¶

Returns the rewards.

Returns:	array of rewards
Return type:	numpy.ndarray

terminals¶

Returns the terminal flags.

Returns:	array of terminal flags.
Return type:	numpy.ndarray