d3rlpy.dataset.MDPDataset¶
-
class
d3rlpy.dataset.
MDPDataset
(observations, actions, rewards, terminals, discrete_action=False)[source]¶ Markov-Decision Process Dataset class.
MDPDataset is deisnged for reinforcement learning datasets to use them like supervised learning datasets.
from d3rlpy.dataset import MDPDataset # 1000 steps of observations with shape of (100,) observations = np.random.random((1000, 100)) # 1000 steps of actions with shape of (4,) actions = np.random.random((1000, 4)) # 1000 steps of rewards rewards = np.random.random(1000) # 1000 steps of terminal flags terminals = np.random.randint(2, size=1000) dataset = MDPDataset(observations, actions, rewards, terminals)
The MDPDataset object automatically splits the given data into list of
d3rlpy.dataset.Episode
objects. Furthermore, the MDPDataset object behaves like a list in order to use with scikit-learn utilities.# returns the number of episodes len(dataset) # access to the first episode episode = dataset[0] # iterate through all episodes for episode in dataset: pass
Parameters: - observations (numpy.ndarray or list(numpy.ndarray)) – N-D array. If the observation is a vector, the shape should be (N, dim_observation). If the observations is an image, the shape should be (N, C, H, W).
- actions (numpy.ndarray) – N-D array. If the actions-space is continuous, the shape should be (N, dim_action). If the action-space is discrete, the shpae should be (N,).
- rewards (numpy.ndarray) – array of scalar rewards.
- terminals (numpy.ndarray) – array of binary terminal flags.
- discrete_action (bool) – flag to use the given actions as discrete action-space actions.
Methods
-
append
(observations, actions, rewards, terminals)[source]¶ Appends new data.
Parameters: - observations (numpy.ndarray or list(numpy.ndarray)) – N-D array.
- actions (numpy.ndarray) – actions.
- rewards (numpy.ndarray) – rewards.
- terminals (numpy.ndarray) – terminals.
-
compute_stats
()[source]¶ Computes statistics of the dataset.
stats = dataset.compute_stats() # return statistics stats['return']['mean'] stats['return']['std'] stats['return']['min'] stats['return']['max'] # reward statistics stats['reward']['mean'] stats['reward']['std'] stats['reward']['min'] stats['reward']['max'] # action (only with continuous control actions) stats['action']['mean'] stats['action']['std'] stats['action']['min'] stats['action']['max'] # observation (only with numpy.ndarray observations) stats['observation']['mean'] stats['observation']['std'] stats['observation']['min'] stats['observation']['max']
Returns: statistics of the dataset. Return type: dict
-
get_action_size
()[source]¶ Returns dimension of action-space.
If discrete_action=True, the return value will be the maximum index +1 in the give actions.
Returns: dimension of action-space. Return type: int
-
get_observation_shape
()[source]¶ Returns observation shape.
Returns: observation shape. Return type: tuple
-
is_action_discrete
()[source]¶ Returns discrete_action flag.
Returns: discrete_action flag. Return type: bool
-
classmethod
load
(fname)[source]¶ Loads dataset from HDF5.
import numpy as np from d3rlpy.dataset import MDPDataset dataset = MDPDataset(np.random.random(10, 4), np.random.random(10, 2), np.random.random(10), np.random.randint(2, size=10)) # save as HDF5 dataset.dump('dataset.h5') # load from HDF5 new_dataset = MDPDataset.load('dataset.h5')
Parameters: fname (str) – file path.
-
size
()[source]¶ Returns the number of episodes in the dataset.
Returns: the number of episodes. Return type: int
Attributes
-
actions
¶ Returns the actions.
Returns: array of actions. Return type: numpy.ndarray
-
episodes
¶ Returns the episodes.
Returns: list of d3rlpy.dataset.Episode
objects.Return type: list(d3rlpy.dataset.Episode)
-
observations
¶ Returns the observations.
Returns: array of observations. Return type: (numpy.ndarray or list(numpy.ndarray))
-
rewards
¶ Returns the rewards.
Returns: array of rewards Return type: numpy.ndarray
-
terminals
¶ Returns the terminal flags.
Returns: array of terminal flags. Return type: numpy.ndarray