d3rlpy.dynamics.ProbabilisticEnsembleDynamics

class d3rlpy.dynamics.ProbabilisticEnsembleDynamics(*, learning_rate=0.001, optim_factory=d3rlpy.models.optimizers.AdamFactory(optim_cls='Adam', betas=(0.9, 0.999), eps=1e-08, weight_decay=0.0001, amsgrad=False), encoder_factory='default', batch_size=100, n_frames=1, n_ensembles=5, variance_type='max', discrete_action=False, scaler=None, action_scaler=None, reward_scaler=None, use_gpu=False, impl=None, **kwargs)[source]

Probabilistic ensemble dynamics.

The ensemble dynamics model consists of \(N\) probablistic models \(\{T_{\theta_i}\}_{i=1}^N\). At each epoch, new transitions are generated via randomly picked dynamics model \(T_\theta\).

\[s_{t+1}, r_{t+1} \sim T_\theta(s_t, a_t)\]

where \(s_t \sim D\) for the first step, otherwise \(s_t\) is the previous generated observation, and \(a_t \sim \pi(\cdot|s_t)\).

Note

Currently, ProbabilisticEnsembleDynamics only supports vector observations.

References

Parameters
  • learning_rate (float) – learning rate for dynamics model.

  • optim_factory (d3rlpy.models.optimizers.OptimizerFactory) – optimizer factory.

  • encoder_factory (d3rlpy.models.encoders.EncoderFactory or str) – encoder factory.

  • batch_size (int) – mini-batch size.

  • n_frames (int) – the number of frames to stack for image observation.

  • n_ensembles (int) – the number of dynamics model for ensemble.

  • variance_type (str) – variance calculation type. The available options are ['max', 'data'].

  • discrete_action (bool) – flag to take discrete actions.

  • scaler (d3rlpy.preprocessing.scalers.Scaler or str) – preprocessor. The available options are ['pixel', 'min_max', 'standard'].

  • action_scaler (d3rlpy.preprocessing.Actionscalers or str) – action preprocessor. The available options are ['min_max'].

  • reward_scaler (d3rlpy.preprocessing.RewardScaler or str) – reward preprocessor. The available options are ['clip', 'min_max', 'standard'].

  • use_gpu (bool or d3rlpy.gpu.Device) – flag to use GPU or device.

  • impl (d3rlpy.dynamics.torch.ProbabilisticEnsembleDynamicsImpl) – dynamics implementation.

  • kwargs (Any) –

Methods

build_with_dataset(dataset)

Instantiate implementation object with MDPDataset object.

Parameters

dataset (d3rlpy.dataset.MDPDataset) – dataset.

Return type

None

build_with_env(env)

Instantiate implementation object with OpenAI Gym object.

Parameters

env (gym.core.Env) – gym-like environment.

Return type

None

create_impl(observation_shape, action_size)

Instantiate implementation objects with the dataset shapes.

This method will be used internally when fit method is called.

Parameters
  • observation_shape (Sequence[int]) – observation shape.

  • action_size (int) – dimension of action-space.

Return type

None

fit(dataset, n_epochs=None, n_steps=None, n_steps_per_epoch=10000, save_metrics=True, experiment_name=None, with_timestamp=True, logdir='d3rlpy_logs', verbose=True, show_progress=True, tensorboard_dir=None, eval_episodes=None, save_interval=1, scorers=None, shuffle=True, callback=None)

Trains with the given dataset.

algo.fit(episodes, n_steps=1000000)
Parameters
  • dataset (Union[List[d3rlpy.dataset.Episode], d3rlpy.dataset.MDPDataset]) – list of episodes to train.

  • n_epochs (Optional[int]) – the number of epochs to train.

  • n_steps (Optional[int]) – the number of steps to train.

  • n_steps_per_epoch (int) – the number of steps per epoch. This value will be ignored when n_steps is None.

  • save_metrics (bool) – flag to record metrics in files. If False, the log directory is not created and the model parameters are not saved during training.

  • experiment_name (Optional[str]) – experiment name for logging. If not passed, the directory name will be {class name}_{timestamp}.

  • with_timestamp (bool) – flag to add timestamp string to the last of directory name.

  • logdir (str) – root directory name to save logs.

  • verbose (bool) – flag to show logged information on stdout.

  • show_progress (bool) – flag to show progress bar for iterations.

  • tensorboard_dir (Optional[str]) – directory to save logged information in tensorboard (additional to the csv data). if None, the directory will not be created.

  • eval_episodes (Optional[List[d3rlpy.dataset.Episode]]) – list of episodes to test.

  • save_interval (int) – interval to save parameters.

  • scorers (Optional[Dict[str, Callable[[Any, List[d3rlpy.dataset.Episode]], float]]]) – list of scorer functions used with eval_episodes.

  • shuffle (bool) – flag to shuffle transitions on each epoch.

  • callback (Optional[Callable[[d3rlpy.base.LearnableBase, int, int], None]]) – callable function that takes (algo, epoch, total_step) , which is called every step.

Returns

list of result tuples (epoch, metrics) per epoch.

Return type

List[Tuple[int, Dict[str, float]]]

fitter(dataset, n_epochs=None, n_steps=None, n_steps_per_epoch=10000, save_metrics=True, experiment_name=None, with_timestamp=True, logdir='d3rlpy_logs', verbose=True, show_progress=True, tensorboard_dir=None, eval_episodes=None, save_interval=1, scorers=None, shuffle=True, callback=None)
Iterate over epochs steps to train with the given dataset. At each

iteration algo methods and properties can be changed or queried.

for epoch, metrics in algo.fitter(episodes):
    my_plot(metrics)
    algo.save_model(my_path)
Parameters
  • dataset (Union[List[d3rlpy.dataset.Episode], d3rlpy.dataset.MDPDataset]) – list of episodes to train.

  • n_epochs (Optional[int]) – the number of epochs to train.

  • n_steps (Optional[int]) – the number of steps to train.

  • n_steps_per_epoch (int) – the number of steps per epoch. This value will be ignored when n_steps is None.

  • save_metrics (bool) – flag to record metrics in files. If False, the log directory is not created and the model parameters are not saved during training.

  • experiment_name (Optional[str]) – experiment name for logging. If not passed, the directory name will be {class name}_{timestamp}.

  • with_timestamp (bool) – flag to add timestamp string to the last of directory name.

  • logdir (str) – root directory name to save logs.

  • verbose (bool) – flag to show logged information on stdout.

  • show_progress (bool) – flag to show progress bar for iterations.

  • tensorboard_dir (Optional[str]) – directory to save logged information in tensorboard (additional to the csv data). if None, the directory will not be created.

  • eval_episodes (Optional[List[d3rlpy.dataset.Episode]]) – list of episodes to test.

  • save_interval (int) – interval to save parameters.

  • scorers (Optional[Dict[str, Callable[[Any, List[d3rlpy.dataset.Episode]], float]]]) – list of scorer functions used with eval_episodes.

  • shuffle (bool) – flag to shuffle transitions on each epoch.

  • callback (Optional[Callable[[d3rlpy.base.LearnableBase, int, int], None]]) – callable function that takes (algo, epoch, total_step) , which is called every step.

Returns

iterator yielding current epoch and metrics dict.

Return type

Generator[Tuple[int, Dict[str, float]], None, None]

classmethod from_json(fname, use_gpu=False)

Returns algorithm configured with json file.

The Json file should be the one saved during fitting.

from d3rlpy.algos import Algo

# create algorithm with saved configuration
algo = Algo.from_json('d3rlpy_logs/<path-to-json>/params.json')

# ready to load
algo.load_model('d3rlpy_logs/<path-to-model>/model_100.pt')

# ready to predict
algo.predict(...)
Parameters
  • fname (str) – file path to params.json.

  • use_gpu (Optional[Union[bool, int, d3rlpy.gpu.Device]]) – flag to use GPU, device ID or device.

Returns

algorithm.

Return type

d3rlpy.base.LearnableBase

generate_new_data(transitions)

Returns generated transitions for data augmentation.

This method is for model-based RL algorithms.

Parameters

transitions (List[d3rlpy.dataset.Transition]) – list of transitions.

Returns

list of new transitions.

Return type

Optional[List[d3rlpy.dataset.Transition]]

get_action_type()[source]

Returns action type (continuous or discrete).

Returns

action type.

Return type

d3rlpy.constants.ActionSpace

get_params(deep=True)

Returns the all attributes.

This method returns the all attributes including ones in subclasses. Some of scikit-learn utilities will use this method.

params = algo.get_params(deep=True)

# the returned values can be used to instantiate the new object.
algo2 = AlgoBase(**params)
Parameters

deep (bool) – flag to deeply copy objects such as impl.

Returns

attribute values in dictionary.

Return type

Dict[str, Any]

load_model(fname)

Load neural network parameters.

algo.load_model('model.pt')
Parameters

fname (str) – source file path.

Return type

None

predict(x, action, with_variance=False, indices=None)

Returns predicted observation and reward.

Parameters
  • x (Union[numpy.ndarray, List[Any]]) – observation

  • action (Union[numpy.ndarray, List[Any]]) – action

  • with_variance (bool) – flag to return prediction variance.

  • indices (Optional[numpy.ndarray]) – index of ensemble model to return.

Returns

tuple of predicted observation and reward. If with_variance is True, the prediction variance will be added as the 3rd element.

Return type

Union[Tuple[numpy.ndarray, numpy.ndarray], Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]]

save_model(fname)

Saves neural network parameters.

algo.save_model('model.pt')
Parameters

fname (str) – destination file path.

Return type

None

save_params(logger)

Saves configurations as params.json.

Parameters

logger (d3rlpy.logger.D3RLPyLogger) – logger object.

Return type

None

set_active_logger(logger)

Set active D3RLPyLogger object

Parameters

logger (d3rlpy.logger.D3RLPyLogger) – logger object.

Return type

None

set_grad_step(grad_step)

Set total gradient step counter.

This method can be used to restart the middle of training with an arbitrary gradient step counter, which has effects on periodic functions such as the target update.

Parameters

grad_step (int) – total gradient step counter.

Return type

None

set_params(**params)

Sets the given arguments to the attributes if they exist.

This method sets the given values to the attributes including ones in subclasses. If the values that don’t exist as attributes are passed, they are ignored. Some of scikit-learn utilities will use this method.

algo.set_params(batch_size=100)
Parameters

params (Any) – arbitrary inputs to set as attributes.

Returns

itself.

Return type

d3rlpy.base.LearnableBase

update(batch)

Update parameters with mini-batch of data.

Parameters

batch (d3rlpy.dataset.TransitionMiniBatch) – mini-batch data.

Returns

dictionary of metrics.

Return type

Dict[str, float]

Attributes

action_scaler

Preprocessing action scaler.

Returns

preprocessing action scaler.

Return type

Optional[ActionScaler]

action_size

Action size.

Returns

action size.

Return type

Optional[int]

active_logger

Active D3RLPyLogger object.

This will be only available during training.

Returns

logger object.

batch_size

Batch size to train.

Returns

batch size.

Return type

int

gamma

Discount factor.

Returns

discount factor.

Return type

float

grad_step

Total gradient step counter.

This value will keep counting after fit and fit_online methods finish.

Returns

total gradient step counter.

impl

Implementation object.

Returns

implementation object.

Return type

Optional[ImplBase]

n_frames

Number of frames to stack.

This is only for image observation.

Returns

number of frames to stack.

Return type

int

n_steps

N-step TD backup.

Returns

N-step TD backup.

Return type

int

observation_shape

Observation shape.

Returns

observation shape.

Return type

Optional[Sequence[int]]

reward_scaler

Preprocessing reward scaler.

Returns

preprocessing reward scaler.

Return type

Optional[RewardScaler]

scaler

Preprocessing scaler.

Returns

preprocessing scaler.

Return type

Optional[Scaler]