d3rlpy.preprocessing.StandardRewardScaler

class d3rlpy.preprocessing.StandardRewardScaler(dataset=None, mean=None, std=None, eps=0.001)[source]

Reward standardization preprocessing.

\[r' = (r - \mu) / \sigma\]
from d3rlpy.algos import CQL

cql = CQL(reward_scaler="standard")

You can also initialize with d3rlpy.dataset.MDPDataset object or manually.

from d3rlpy.preprocessing import StandardRewardScaler

# initialize with dataset
scaler = StandardRewardScaler(dataset)

# initialize manually
scaler = StandardRewardScaler(mean=0.0, std=1.0)

cql = CQL(scaler=scaler)

Methods

Parameters
fit(episodes)[source]

Estimates scaling parameters from dataset.

Parameters

episodes (List[d3rlpy.dataset.Episode]) – list of episodes.

Return type

None

fit_with_env(env)

Gets scaling parameters from environment.

Note

RewardScaler does not support fitting with environment.

Parameters

env (gym.core.Env) – gym environment.

Return type

None

get_params(deep=False)[source]

Returns scaling parameters.

Parameters

deep (bool) – flag to deeply copy objects.

Returns

scaler parameters.

Return type

Dict[str, Any]

get_type()

Returns a scaler type.

Returns

scaler type.

Return type

str

reverse_transform(reward)[source]

Returns reversely processed rewards.

Parameters

reward (torch.Tensor) – reward.

Returns

reversely processed reward.

Return type

torch.Tensor

transform(reward)[source]

Returns processed rewards.

Parameters

reward (torch.Tensor) – reward.

Returns

processed reward.

Return type

torch.Tensor

transform_numpy(reward)[source]

Returns transformed rewards in numpy array.

Parameters

reward (numpy.ndarray) – reward.

Returns

transformed reward.

Return type

numpy.ndarray

Attributes

TYPE: ClassVar[str] = 'standard'