d3rlpy.preprocessing.MinMaxRewardScaler¶
- class d3rlpy.preprocessing.MinMaxRewardScaler(minimum=None, maximum=None, multiplier=1.0)[source]¶
Min-Max reward normalization preprocessing.
Rewards will be normalized in range
[0.0, 1.0]
.\[r' = (r - \min(r)) / (\max(r) - \min(r))\]from d3rlpy.preprocessing import MinMaxRewardScaler from d3rlpy.algos import CQLConfig # normalize based on datasets cql = CQLConfig(reward_scaler=MinMaxRewardScaler()).create() # initialize manually reward_scaler = MinMaxRewardScaler(minimum=0.0, maximum=10.0) cql = CQLConfig(reward_scaler=reward_scaler).create()
- Parameters:
Methods
- classmethod deserialize(serialized_config)¶
- Parameters:
serialized_config (str) –
- Return type:
TConfig
- classmethod deserialize_from_dict(dict_config)¶
- fit_with_env(env)¶
Gets scaling parameters from environment.
- fit_with_trajectory_slicer(episodes, trajectory_slicer)[source]¶
Estimates scaling parameters from dataset.
- Parameters:
episodes (Sequence[EpisodeBase]) – List of episodes.
trajectory_slicer (TrajectorySlicerProtocol) – Trajectory slicer to process mini-batch.
- Return type:
None
- fit_with_transition_picker(episodes, transition_picker)[source]¶
Estimates scaling parameters from dataset.
- Parameters:
episodes (Sequence[EpisodeBase]) – List of episodes.
transition_picker (TransitionPickerProtocol) – Transition picker to process mini-batch.
- Return type:
None
- classmethod from_dict(kvs, *, infer_missing=False)¶
- classmethod from_json(s, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw)¶
- reverse_transform(x)[source]¶
Returns reversely transformed output.
- Parameters:
x (Tensor) – input.
- Returns:
Inversely transformed output.
- Return type:
Tensor
- classmethod schema(*, infer_missing=False, only=None, exclude=(), many=False, context=None, load_only=(), dump_only=(), partial=False, unknown=None)¶
- to_dict(encode_json=False)¶
- to_json(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, indent=None, separators=None, default=None, sort_keys=False, **kw)¶
- transform(x)[source]¶
Returns processed output.
- Parameters:
x (Tensor) – Input.
- Returns:
Processed output.
- Return type:
Tensor
Attributes
- built¶