d3rlpy.dataset.Transition¶

class d3rlpy.dataset.Transition¶

Transition class.

This class is designed to hold data between two time steps, which is usually used as inputs of loss calculation in reinforcement learning.

Parameters

observation_shape (tuple) – observation shape.
action_size (int) – dimension of action-space.
observation (numpy.ndarray) – observation at t.
action (numpy.ndarray or int) – action at t.
reward (float) – reward at t.
next_observation (numpy.ndarray) – observation at t+1.
next_action (numpy.ndarray or int) – action at t+1.
next_reward (float) – reward at t+1.
terminal (int) – terminal flag at t+1.
mask (numpy.ndarray) – binary mask for bootstrapping.
prev_transition (d3rlpy.dataset.Transition) – pointer to the previous transition.
next_transition (d3rlpy.dataset.Transition) – pointer to the next transition.

Methods

clear_links()¶

Clears links to the next and previous transitions.

This method is necessary to call when freeing this instance by GC.

get_action_size()¶

Returns dimension of action-space.

get_observation_shape()¶

Returns observation shape.

Attributes

action¶

Returns action at t.

mask¶

Returns binary mask for bootstrapping.

next_action¶

Returns action at t+1.

next_observation¶

Returns observation at t+1.

next_reward¶

Returns reward at t+1.

next_transition¶

Returns pointer to the next transition.

If this is the last transition, this method should return None.

observation¶

Returns observation at t.

prev_transition¶

Returns pointer to the previous transition.

If this is the first transition, this method should return None.

reward¶

Returns reward at t.

terminal¶

Returns terminal flag at t+1.