d3rlpy.online.buffers.ReplayBuffer¶

class d3rlpy.online.buffers.ReplayBuffer(maxlen, env)[source]¶

Standard Replay Buffer.

Parameters:	maxlen (int) – the maximum number of data length. env (gym.Env) – gym-like environment to extract shape information.

maxlen¶

the maximum number of data length

Type:	int

observations¶

list of observations.

Type:	list(numpy.ndarray)

actions¶

list of actions.

Type:	list(numpy.ndarray) or list(int)

rewards¶

list of rewards.

Type:	list(float)

terminals¶

list of terminal flags.

Type:	list(float)

cursor¶

current cursor pointing to list location to insert.

Type:	int

observation_shape¶

observation shape.

Type:	tuple

action_size¶

action size.

Type:	int

Methods

append(observation, action, reward, terminal)[source]¶

Append observation, action, reward and terminal flag to buffer.

Parameters:	observation (numpy.ndarray) – observation. action (numpy.ndarray or int) – action. reward (float) – reward. terminal (bool or float) – terminal flag.

sample(batch_size)[source]¶

Returns sampled mini-batch of transitions.

Parameters:	batch_size (int) – mini-batch size.
Returns:	mini-batch.
Return type:	d3rlpy.dataset.TransitionMiniBatch

Returns the number of appended elements in buffer.

Returns:	the number of elements in buffer.
Return type:	int