Preprocess / Postprocess

In this tutorial, you can learn how to preprocess datasets and postprocess continuous action outputs. Please check Preprocessing for more information.

Preprocess Observations

If your dataset includes unnormalized observations, you can normalize or standardize the observations by specifying scaler argument with a string alias. In this case, the statistics of the dataset will be computed at the beginning of offline training.

import d3rlpy

dataset, _ = d3rlpy.datasets.get_dataset("pendulum-random")

# specify by string alias
sac = d3rlpy.algos.SAC(scaler="standard")

Alternatively, you can manually instantiate preprocessing parameters.

# setup manually
mean = np.mean(dataset.observations, axis=0, keepdims=True)
std = np.std(dataset.observations, axis=0, keepdims=True)
scaler = d3rlpy.preprocessing.StandardScaler(mean=mean, std=std)

# specify by object
sac = d3rlpy.algos.SAC(scaler=scaler)

Please check Preprocessing for the full list of available observation preprocessors.

Preprocess / Postprocess Actions

In training with continuous action-space, the actions must be in the range between [-1.0, 1.0] due to the underlying tanh activation at the policy functions. In d3rlpy, you can easily normalize inputs and denormalize outpus instead of normalizing datasets by yourself.

# specify by string alias
sac = d3rlpy.algos.SAC(action_scaler="min_max")

# setup manually
minimum_action = np.min(dataset.actions, axis=0, keepdims=True)
maximum_action = np.max(dataset.actions, axis=0, keepdims=True)
action_scaler = d3rlpy.preprocessing.MinMaxActionScaler(
    minimum=minimum_action,
    maximum=maximum_action,
)

# specify by object
sac = d3rlpy.algos.SAC(action_scaler=action_scaler)

Please check Preprocessing for the full list of available action preprocessors.

Preprocess Rewards

The effect of scaling rewards is not well studied yet in RL community, however, it’s confirmed that the reward scale affects training performance.

# specify by string alias
sac = d3rlpy.algos.SAC(reward_scaler="standard")

# setup manuall
mean = np.mean(dataset.rewards, axis=0, keepdims=True)
std = np.std(dataset.rewards, axis=0, keepdims=True)
reward_scaler = StandardRewardScaler(mean=mean, std=std)

# specify by object
sac = d3rlpy.algos.SAC(reward_scaler=reward_scaler)

Please check Preprocessing for the full list of available reward preprocessors.