d3rlpy.dataset.MDPDataset¶
-
class
d3rlpy.dataset.
MDPDataset
(observations, actions, rewards, terminals, episode_terminals=None, discrete_action=None)¶ Markov-Decision Process Dataset class.
MDPDataset is deisnged for reinforcement learning datasets to use them like supervised learning datasets.
from d3rlpy.dataset import MDPDataset # 1000 steps of observations with shape of (100,) observations = np.random.random((1000, 100)) # 1000 steps of actions with shape of (4,) actions = np.random.random((1000, 4)) # 1000 steps of rewards rewards = np.random.random(1000) # 1000 steps of terminal flags terminals = np.random.randint(2, size=1000) dataset = MDPDataset(observations, actions, rewards, terminals)
The MDPDataset object automatically splits the given data into list of
d3rlpy.dataset.Episode
objects. Furthermore, the MDPDataset object behaves like a list in order to use with scikit-learn utilities.# returns the number of episodes len(dataset) # access to the first episode episode = dataset[0] # iterate through all episodes for episode in dataset: pass
- Parameters
observations (numpy.ndarray) – N-D array. If the observation is a vector, the shape should be (N, dim_observation). If the observations is an image, the shape should be (N, C, H, W).
actions (numpy.ndarray) – N-D array. If the actions-space is continuous, the shape should be (N, dim_action). If the action-space is discrete, the shape should be (N,).
rewards (numpy.ndarray) – array of scalar rewards.
terminals (numpy.ndarray) – array of binary terminal flags.
episode_terminals (numpy.ndarray) – array of binary episode terminal flags. The given data will be splitted based on this flag. This is useful if you want to specify the non-environment terminations (e.g. timeout). If
None
, the episode terminations match the environment terminations.discrete_action (bool) – flag to use the given actions as discrete action-space actions. If
None
, the action type is automatically determined.
Methods
-
__getitem__
(index)¶
-
__len__
()¶
-
__iter__
()¶
-
append
(observations, actions, rewards, terminals, episode_terminals=None)¶ Appends new data.
- Parameters
observations (numpy.ndarray) – N-D array.
actions (numpy.ndarray) – actions.
rewards (numpy.ndarray) – rewards.
terminals (numpy.ndarray) – terminals.
episode_terminals (numpy.ndarray) – episode terminals.
-
build_episodes
()¶ Builds episode objects.
This method will be internally called when accessing the episodes property at the first time.
-
clip_reward
(low=None, high=None)¶ Clips rewards in the given range.
-
compute_stats
()¶ Computes statistics of the dataset.
stats = dataset.compute_stats() # return statistics stats['return']['mean'] stats['return']['std'] stats['return']['min'] stats['return']['max'] # reward statistics stats['reward']['mean'] stats['reward']['std'] stats['reward']['min'] stats['reward']['max'] # action (only with continuous control actions) stats['action']['mean'] stats['action']['std'] stats['action']['min'] stats['action']['max'] # observation (only with numpy.ndarray observations) stats['observation']['mean'] stats['observation']['std'] stats['observation']['min'] stats['observation']['max']
- Returns
statistics of the dataset.
- Return type
-
extend
(dataset)¶ Extend dataset by another dataset.
- Parameters
dataset (d3rlpy.dataset.MDPDataset) – dataset.
-
get_action_size
()¶ Returns dimension of action-space.
If discrete_action=True, the return value will be the maximum index +1 in the give actions.
- Returns
dimension of action-space.
- Return type
-
classmethod
load
(fname)¶ Loads dataset from HDF5.
import numpy as np from d3rlpy.dataset import MDPDataset dataset = MDPDataset(np.random.random(10, 4), np.random.random(10, 2), np.random.random(10), np.random.randint(2, size=10)) # save as HDF5 dataset.dump('dataset.h5') # load from HDF5 new_dataset = MDPDataset.load('dataset.h5')
- Parameters
fname (str) – file path.
-
size
()¶ Returns the number of episodes in the dataset.
- Returns
the number of episodes.
- Return type
Attributes
-
actions
¶ Returns the actions.
- Returns
array of actions.
- Return type
-
episode_terminals
¶ Returns the episode terminal flags.
- Returns
array of episode terminal flags.
- Return type
-
episodes
¶ Returns the episodes.
- Returns
list of
d3rlpy.dataset.Episode
objects.- Return type
-
observations
¶ Returns the observations.
- Returns
array of observations.
- Return type
-
rewards
¶ Returns the rewards.
- Returns
array of rewards
- Return type
-
terminals
¶ Returns the terminal flags.
- Returns
array of terminal flags.
- Return type