d3rlpy.dataset.Transition¶
-
class
d3rlpy.dataset.
Transition
¶ Transition class.
This class is designed to hold data between two time steps, which is usually used as inputs of loss calculation in reinforcement learning.
- Parameters
observation_shape (tuple) – observation shape.
action_size (int) – dimension of action-space.
observation (numpy.ndarray) – observation at t.
action (numpy.ndarray or int) – action at t.
reward (float) – reward at t.
next_observation (numpy.ndarray) – observation at t+1.
next_action (numpy.ndarray or int) – action at t+1.
next_reward (float) – reward at t+1.
terminal (int) – terminal flag at t+1.
mask (numpy.ndarray) – binary mask for bootstrapping.
prev_transition (d3rlpy.dataset.Transition) – pointer to the previous transition.
next_transition (d3rlpy.dataset.Transition) – pointer to the next transition.
Methods
-
clear_links
()¶ Clears links to the next and previous transitions.
This method is necessary to call when freeing this instance by GC.
-
get_action_size
()¶ Returns dimension of action-space.
- Returns
dimension of action-space.
- Return type
Attributes
-
action
¶ Returns action at t.
- Returns
action at t.
- Return type
(numpy.ndarray or int)
-
mask
¶ Returns binary mask for bootstrapping.
- Returns
array of binary mask.
- Return type
np.ndarray
-
next_action
¶ Returns action at t+1.
- Returns
action at t+1.
- Return type
(numpy.ndarray or int)
-
next_observation
¶ Returns observation at t+1.
- Returns
observation at t+1.
- Return type
numpy.ndarray or torch.Tensor
-
next_transition
¶ Returns pointer to the next transition.
If this is the last transition, this method should return
None
.- Returns
next transition.
- Return type
-
observation
¶ Returns observation at t.
- Returns
observation at t.
- Return type
numpy.ndarray or torch.Tensor
-
prev_transition
¶ Returns pointer to the previous transition.
If this is the first transition, this method should return
None
.- Returns
previous transition.
- Return type