d3rlpy.preprocessing.StandardRewardScaler¶
- class d3rlpy.preprocessing.StandardRewardScaler(dataset=None, mean=None, std=None, eps=0.001, multiplier=1.0)[source]¶
Reward standardization preprocessing.
\[r' = (r - \mu) / \sigma\]from d3rlpy.algos import CQL cql = CQL(reward_scaler="standard")
You can also initialize with
d3rlpy.dataset.MDPDataset
object or manually.from d3rlpy.preprocessing import StandardRewardScaler # initialize with dataset scaler = StandardRewardScaler(dataset) # initialize manually scaler = StandardRewardScaler(mean=0.0, std=1.0) cql = CQL(scaler=scaler)
- Parameters
dataset (d3rlpy.dataset.MDPDataset) – dataset object.
mean (float) – mean value.
std (float) – standard deviation value.
eps (float) – constant value to avoid zero-division.
multiplier (float) – constant multiplication value
Methods
- fit(transitions)[source]¶
Estimates scaling parameters from dataset.
- Parameters
transitions (List[d3rlpy.dataset.Transition]) – list of transitions.
- Return type
- fit_with_env(env)¶
Gets scaling parameters from environment.
Note
RewardScaler
does not support fitting with environment.- Parameters
env (gym.core.Env) – gym environment.
- Return type
- reverse_transform(reward)[source]¶
Returns reversely processed rewards.
- Parameters
reward (torch.Tensor) – reward.
- Returns
reversely processed reward.
- Return type
torch.Tensor
- transform(reward)[source]¶
Returns processed rewards.
- Parameters
reward (torch.Tensor) – reward.
- Returns
processed reward.
- Return type
torch.Tensor
- transform_numpy(reward)[source]¶
Returns transformed rewards in numpy array.
- Parameters
reward (numpy.ndarray) – reward.
- Returns
transformed reward.
- Return type
Attributes