d3rlpy.algos.BC¶

class d3rlpy.algos.BC(learning_rate=0.001, batch_size=100, eps=1e-08, use_batch_norm=False, n_epochs=1000, use_gpu=False, scaler=None, augmentation=[], n_augmentations=1, encoder_params={}, dynamics=None, impl=None, **kwargs)[source]¶

Behavior Cloning algorithm.

Behavior Cloning (BC) is to imitate actions in the dataset via a supervised learning approach. Since BC is only imitating action distributions, the performance will be close to the mean of the dataset even though BC mostly works better than online RL algorithms.

\[L(\theta) = \mathbb{E}_{a_t, s_t \sim D} [(a_t - \pi_\theta(s_t))^2]\]

Parameters:

learning_rate (float) – learing rate.
batch_size (int) – mini-batch size.
eps (float) – \(\epsilon\) for Adam optimizer.
use_batch_norm (bool) – flag to insert batch normalization layers.
n_epochs (int) – the number of epochs to train.
use_gpu (bool, int or d3rlpy.gpu.Device) – flag to use GPU, device ID or device.
scaler (d3rlpy.preprocessing.Scaler or str) – preprocessor. The available options are [‘pixel’, ‘min_max’, ‘standard’]
augmentation (d3rlpy.augmentation.AugmentationPipeline or list(str)) – augmentation pipeline.
n_augmentations (int) – the number of data augmentations to update.
encoder_params (dict) – optional arguments for encoder setup. If the observation is pixel, you can pass filters with list of tuples consisting with (filter_size, kernel_size, stride) and feature_size with an integer scaler for the last linear layer size. If the observation is vector, you can pass hidden_units with list of hidden unit sizes.
dynamics (d3rlpy.dynamics.base.DynamicsBase) – dynamics model for data augmentation.
impl (d3rlpy.algos.torch.bc_impl.BCImpl) – implemenation of the algorithm.

n_epochs¶

the number of epochs to train.

Type:	int

batch_size¶

mini-batch size.

Type:	int

learning_rate¶

learing rate.

Type:	float

eps¶

\(\epsilon\) for Adam optimizer.

Type:	float

use_batch_norm¶

flag to insert batch normalization layers.

Type:	bool

use_gpu¶

GPU device.

Type:	d3rlpy.gpu.Device

scaler¶

preprocessor.

Type:	d3rlpy.preprocessing.Scaler

augmentation¶

augmentation pipeline.

Type:	d3rlpy.augmentation.AugmentationPipeline

n_augmentations¶

the number of data augmentations to update.

Type:	int

encoder_params¶

optional arguments for encoder setup.

Type:	dict

dynamics¶

dynamics model.

Type:	d3rlpy.dynamics.base.DynamicsBase

impl¶

implemenation of the algorithm.

Type:	d3rlpy.algos.torch.bc_impl.BCImpl

Methods

create_impl(observation_shape, action_size)[source]¶

Instantiate implementation objects with the dataset shapes.

This method will be used internally when fit method is called.

Parameters:	observation_shape (tuple) – observation shape. action_size (int) – dimension of action-space.

fit(episodes, experiment_name=None, with_timestamp=True, logdir='d3rlpy_logs', verbose=True, show_progress=True, tensorboard=True, eval_episodes=None, save_interval=1, scorers=None)¶

Trains with the given dataset.

algo.fit(episodes)

Parameters:

episodes (list(d3rlpy.dataset.Episode)) – list of episodes to train.
experiment_name (str) – experiment name for logging. If not passed, the directory name will be {class name}_{timestamp}.
with_timestamp (bool) – flag to add timestamp string to the last of directory name.
logdir (str) – root directory name to save logs.
verbose (bool) – flag to show logged information on stdout.
show_progress (bool) – flag to show progress bar for iterations.
tensorboard (bool) – flag to save logged information in tensorboard (additional to the csv data)
eval_episodes (list(d3rlpy.dataset.Episode)) – list of episodes to test.
save_interval (int) – interval to save parameters.
scorers (list(callable)) – list of scorer functions used with eval_episodes.

classmethod from_json(fname, use_gpu=False)¶

Returns algorithm configured with json file.

The Json file should be the one saved during fitting.

from d3rlpy.algos import Algo

# create algorithm with saved configuration
algo = Algo.from_json('d3rlpy_logs/<path-to-json>/params.json')

# ready to load
algo.load_model('d3rlpy_logs/<path-to-model>/model_100.pt')

# ready to predict
algo.predict(...)

Parameters:	fname (str) – file path to params.json. use_gpu (bool, int or d3rlpy.gpu.Device) – flag to use GPU, device ID or device.
Returns:	algorithm.
Return type:	d3rlpy.base.LearnableBase

get_params(deep=True)¶

Returns the all attributes.

This method returns the all attributes including ones in subclasses. Some of scikit-learn utilities will use this method.

params = algo.get_params(deep=True)

# the returned values can be used to instantiate the new object.
algo2 = AlgoBase(**params)

Parameters:	deep (bool) – flag to deeply copy objects such as impl.
Returns:	attribute values in dictionary.
Return type:	dict

load_model(fname)¶

Load neural network parameters.

algo.load_model('model.pt')

Parameters:	fname (str) – source file path.

predict(x)¶

Returns greedy actions.

# 100 observations with shape of (10,)
x = np.random.random((100, 10))

actions = algo.predict(x)
# actions.shape == (100, action size) for continuous control
# actions.shape == (100,) for discrete control

Parameters:	x (numpy.ndarray) – observations
Returns:	greedy actions
Return type:	numpy.ndarray

predict_value(x, action)[source]¶: value prediction is not supported by BC algorithms.

sample_action(x)[source]¶: sampling action is not supported by BC algorithm.

save_model(fname)¶

Saves neural network parameters.

algo.save_model('model.pt')

Parameters:	fname (str) – destination file path.

save_policy(fname, as_onnx=False)¶

Save the greedy-policy computational graph as TorchScript or ONNX.

# save as TorchScript
algo.save_policy('policy.pt')

# save as ONNX
algo.save_policy('policy.onnx', as_onnx=True)

The artifacts saved with this method will work without d3rlpy. This method is especially useful to deploy the learned policy to production environments or embedding systems.

Parameters:	**params – arbitrary inputs to set as attributes.
Returns:	itself.
Return type:	d3rlpy.algos.base.AlgoBase