d3rlpy.preprocessing.ReturnBasedRewardScaler¶

class d3rlpy.preprocessing.ReturnBasedRewardScaler(return_max=None, return_min=None, multiplier=1.0)[source]¶

Reward normalization preprocessing based on return scale.

\[r' = r / (R_{max} - R_{min})\]

from d3rlpy.preprocessing import ReturnBasedRewardScaler
from d3rlpy.algos import CQLConfig

# normalize based on datasets
cql = CQLConfig(reward_scaler=ReturnBasedRewardScaler()).create()

# initialize manually
reward_scaler = ReturnBasedRewardScaler(
    return_max=100.0,
    return_min=1.0,
)
cql = CQLConfig(reward_scaler=reward_scaler).create()

References

Kostrikov et al., Offline Reinforcement Learning with Implicit Q-Learning.

Parameters:

return_max (float) – Maximum return value.
return_min (float) – Standard deviation value.
multiplier (float) – Constant multiplication value

Methods

classmethod deserialize(serialized_config)¶

Parameters:: serialized_config (str) –
Return type:: TConfig

classmethod deserialize_from_dict(dict_config)¶

Parameters:: dict_config (Dict[str, Any]) –
Return type:: TConfig

classmethod deserialize_from_file(path)¶

Parameters:: path (str) –
Return type:: TConfig

fit_with_env(env)¶

Gets scaling parameters from environment.

Parameters:: env (Union[Env[Any, Any], Env[Any, Any]]) – Gym environment.
Return type:: None

fit_with_trajectory_slicer(episodes, trajectory_slicer)[source]¶

Estimates scaling parameters from dataset.

Parameters:

episodes (Sequence[EpisodeBase]) – List of episodes.
trajectory_slicer (TrajectorySlicerProtocol) – Trajectory slicer to process mini-batch.

Return type:

None

fit_with_transition_picker(episodes, transition_picker)[source]¶

Estimates scaling parameters from dataset.

Parameters:

episodes (Sequence[EpisodeBase]) – List of episodes.
transition_picker (TransitionPickerProtocol) – Transition picker to process mini-batch.

Return type:

None

classmethod from_dict(kvs, *, infer_missing=False)¶

Parameters:: kvs (Optional[Union[dict, list, str, int, float, bool]]) –
Return type:: A

classmethod from_json(s, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw)¶

Parameters:: s (Union[str, bytes, bytearray]) –
Return type:: A

static get_type()[source]¶

Return type:: str

reverse_transform(x)[source]¶

Returns reversely transformed output.

Parameters:: x (Tensor) – input.
Returns:: Inversely transformed output.
Return type:: Tensor

reverse_transform_numpy(x)[source]¶

Returns reversely transformed output in numpy.

Parameters:: x (ndarray[Any, dtype[Any]]) – Input.
Returns:: Inversely transformed output.
Return type:: ndarray[Any, dtype[Any]]

classmethod schema(*, infer_missing=False, only=None, exclude=(), many=False, context=None, load_only=(), dump_only=(), partial=False, unknown=None)¶

Parameters:

infer_missing (bool) –
many (bool) –
partial (bool) –

Return type:

SchemaF[A]

serialize()¶

Return type:: str

serialize_to_dict()¶

Return type:: Dict[str, Any]

to_dict(encode_json=False)¶

Return type:: Dict[str, Optional[Union[dict, list, str, int, float, bool]]]

to_json(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, indent=None, separators=None, default=None, sort_keys=False, **kw)¶

Parameters:

skipkeys (bool) –
ensure_ascii (bool) –
check_circular (bool) –
allow_nan (bool) –
indent (Optional[Union[int, str]]) –
separators (Optional[Tuple[str, str]]) –
default (Optional[Callable]) –
sort_keys (bool) –

Return type:

str

transform(x)[source]¶

Returns processed output.

Parameters:: x (Tensor) – Input.
Returns:: Processed output.
Return type:: Tensor

transform_numpy(x)[source]¶

Returns processed output in numpy.

Parameters:: x (ndarray[Any, dtype[Any]]) – Input.
Returns:: Processed output.
Return type:: ndarray[Any, dtype[Any]]

Attributes

built¶

multiplier: float = 1.0¶

return_max: Optional[float] = None¶

return_min: Optional[float] = None¶