d3rlpy.dataset.MDPDataset¶

class d3rlpy.dataset.MDPDataset(observations, actions, rewards, terminals, episode_terminals=None, discrete_action=None)¶

Markov-Decision Process Dataset class.

MDPDataset is deisnged for reinforcement learning datasets to use them like supervised learning datasets.

from d3rlpy.dataset import MDPDataset

# 1000 steps of observations with shape of (100,)
observations = np.random.random((1000, 100))
# 1000 steps of actions with shape of (4,)
actions = np.random.random((1000, 4))
# 1000 steps of rewards
rewards = np.random.random(1000)
# 1000 steps of terminal flags
terminals = np.random.randint(2, size=1000)

dataset = MDPDataset(observations, actions, rewards, terminals)

The MDPDataset object automatically splits the given data into list of d3rlpy.dataset.Episode objects. Furthermore, the MDPDataset object behaves like a list in order to use with scikit-learn utilities.

# returns the number of episodes
len(dataset)

# access to the first episode
episode = dataset[0]

# iterate through all episodes
for episode in dataset:
    pass

Parameters

observations (numpy.ndarray) – N-D array. If the observation is a vector, the shape should be (N, dim_observation). If the observations is an image, the shape should be (N, C, H, W).
actions (numpy.ndarray) – N-D array. If the actions-space is continuous, the shape should be (N, dim_action). If the action-space is discrete, the shape should be (N,).
rewards (numpy.ndarray) – array of scalar rewards.
terminals (numpy.ndarray) – array of binary terminal flags.
episode_terminals (numpy.ndarray) – array of binary episode terminal flags. The given data will be splitted based on this flag. This is useful if you want to specify the non-environment terminations (e.g. timeout). If None, the episode terminations match the environment terminations.
discrete_action (bool) – flag to use the given actions as discrete action-space actions. If None, the action type is automatically determined.

Methods

__getitem__(index)¶

__len__()¶

__iter__()¶

append(observations, actions, rewards, terminals, episode_terminals=None)¶

Appends new data.

Parameters

observations (numpy.ndarray) – N-D array.
actions (numpy.ndarray) – actions.
rewards (numpy.ndarray) – rewards.
terminals (numpy.ndarray) – terminals.
episode_terminals (numpy.ndarray) – episode terminals.

build_episodes()¶

Builds episode objects.

This method will be internally called when accessing the episodes property at the first time.

clip_reward(low=None, high=None)¶

Clips rewards in the given range.

Parameters

low (float) – minimum value. If None, clipping is not performed on lower edge.
high (float) – maximum value. If None, clipping is not performed on upper edge.

compute_stats()¶

Computes statistics of the dataset.

stats = dataset.compute_stats()

# return statistics
stats['return']['mean']
stats['return']['std']
stats['return']['min']
stats['return']['max']

# reward statistics
stats['reward']['mean']
stats['reward']['std']
stats['reward']['min']
stats['reward']['max']

# action (only with continuous control actions)
stats['action']['mean']
stats['action']['std']
stats['action']['min']
stats['action']['max']

# observation (only with numpy.ndarray observations)
stats['observation']['mean']
stats['observation']['std']
stats['observation']['min']
stats['observation']['max']

Returns: statistics of the dataset.
Return type: dict

dump(fname)¶

Saves dataset as HDF5.

Parameters: fname (str) – file path.

extend(dataset)¶

Extend dataset by another dataset.

Parameters: dataset (d3rlpy.dataset.MDPDataset) – dataset.

get_action_size()¶

Returns dimension of action-space.

If discrete_action=True, the return value will be the maximum index +1 in the give actions.

Returns: dimension of action-space.
Return type: int

get_observation_shape()¶

Returns observation shape.

Returns: observation shape.
Return type: tuple

is_action_discrete()¶

Returns discrete_action flag.

Returns: discrete_action flag.
Return type: bool

classmethod load(fname)¶

Loads dataset from HDF5.

import numpy as np
from d3rlpy.dataset import MDPDataset

dataset = MDPDataset(np.random.random(10, 4),
                     np.random.random(10, 2),
                     np.random.random(10),
                     np.random.randint(2, size=10))

# save as HDF5
dataset.dump('dataset.h5')

# load from HDF5
new_dataset = MDPDataset.load('dataset.h5')

Parameters: fname (str) – file path.

size()¶

Returns the number of episodes in the dataset.

Returns: the number of episodes.
Return type: int

Attributes

actions¶

Returns the actions.

Returns: array of actions.
Return type: numpy.ndarray

episode_terminals¶

Returns the episode terminal flags.

Returns: array of episode terminal flags.
Return type: numpy.ndarray

episodes¶

Returns the episodes.

Returns: list of d3rlpy.dataset.Episode objects.
Return type: list(d3rlpy.dataset.Episode)

observations¶

Returns the observations.

Returns: array of observations.
Return type: numpy.ndarray

rewards¶

Returns the rewards.

Returns: array of rewards
Return type: numpy.ndarray

terminals¶

Returns the terminal flags.

Returns: array of terminal flags.
Return type: numpy.ndarray