d3rlpy.preprocessing.StandardRewardScaler¶
- class d3rlpy.preprocessing.StandardRewardScaler(mean=None, std=None, eps=0.001, multiplier=1.0)[source]¶
Reward standardization preprocessing.
\[r' = (r - \mu) / \sigma\]from d3rlpy.preprocessing import StandardRewardScaler from d3rlpy.algos import CQLConfig # normalize based on datasets cql = CQLConfig(reward_scaler=StandardRewardScaler()).create() # initialize manually reward_scaler = StandardRewardScaler(mean=0.0, std=1.0) cql = CQLConfig(reward_scaler=reward_scaler).create()
- Parameters:
Methods
- classmethod deserialize(serialized_config)¶
- Parameters:
serialized_config (str) –
- Return type:
TConfig
- classmethod deserialize_from_dict(dict_config)¶
- fit_with_env(env)¶
Gets scaling parameters from environment.
- fit_with_trajectory_slicer(episodes, trajectory_slicer)[source]¶
Estimates scaling parameters from dataset.
- Parameters:
episodes (Sequence[EpisodeBase]) – List of episodes.
trajectory_slicer (TrajectorySlicerProtocol) – Trajectory slicer to process mini-batch.
- Return type:
None
- fit_with_transition_picker(episodes, transition_picker)[source]¶
Estimates scaling parameters from dataset.
- Parameters:
episodes (Sequence[EpisodeBase]) – List of episodes.
transition_picker (TransitionPickerProtocol) – Transition picker to process mini-batch.
- Return type:
None
- classmethod from_dict(kvs, *, infer_missing=False)¶
- classmethod from_json(s, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw)¶
- reverse_transform(x)[source]¶
Returns reversely transformed output.
- Parameters:
x (Tensor) – input.
- Returns:
Inversely transformed output.
- Return type:
Tensor
- classmethod schema(*, infer_missing=False, only=None, exclude=(), many=False, context=None, load_only=(), dump_only=(), partial=False, unknown=None)¶
- to_dict(encode_json=False)¶
- to_json(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, indent=None, separators=None, default=None, sort_keys=False, **kw)¶
- transform(x)[source]¶
Returns processed output.
- Parameters:
x (Tensor) – Input.
- Returns:
Processed output.
- Return type:
Tensor
Attributes
- built¶