d3rlpy.dataset.Transition

class d3rlpy.dataset.Transition(observation_shape, action_size, observation, action, reward, next_observation, next_action, next_reward, terminal)[source]

Transition class.

This class is designed to hold data between two time steps, which is usually used as inputs of loss calculation in reinforcement learning.

Parameters:
  • observation_shape (tuple) – observation shape.
  • action_size (int) – dimension of action-space.
  • observation (numpy.ndarray) – observation at t.
  • action (numpy.ndarray or int) – action at t.
  • reward (float) – reward at t.
  • next_observation (numpy.ndarray) – observation at t+1.
  • next_action (numpy.ndarray or int) – action at t+1.
  • next_reward (float) – reward at t+1.
  • terminal (int) – terminal flag at t+1.

Methods

get_action_size()[source]

Returns dimension of action-space.

Returns:dimension of action-space.
Return type:int
get_observation_shape()[source]

Returns observation shape.

Returns:observation shape.
Return type:tuple

Attributes

action

Returns action at t.

Returns:action at t.
Return type:(numpy.ndarray or int)
next_action

Returns action at t+1.

Returns:action at t+1.
Return type:(numpy.ndarray or int)
next_observation

Returns observation at t+1.

Returns:observation at t+1.
Return type:numpy.ndarray
next_reward

Returns reward at t+1.

Returns:reward at t+1.
Return type:float
observation

Returns observation at t.

Returns:observation at t.
Return type:numpy.ndarray
reward

Returns reward at t.

Returns:reward at t.
Return type:float
terminal

Returns terminal flag at t+1.

Returns:terminal flag at t+1.
Return type:int