d3rlpy.algos.LinearDecayEpsilonGreedy¶

class d3rlpy.algos.LinearDecayEpsilonGreedy(start_epsilon=1.0, end_epsilon=0.1, duration=1000000)[source]¶

\(\epsilon\)-greedy explorer with linear decay schedule.

Parameters

start_epsilon (float) – Initial \(\epsilon\).
end_epsilon (float) – Final \(\epsilon\).
duration (int) – Scheduling duration.

Methods

compute_epsilon(step)[source]¶

Returns decayed \(\epsilon\).

Returns: \(\epsilon\).
Parameters: step (int) –
Return type: float

sample(algo, x, step)[source]¶

Returns \(\epsilon\)-greedy action.

Parameters

algo (d3rlpy.interface.QLearningAlgoProtocol) – Algorithm.
x (Union[numpy.ndarray[Any, numpy.dtype[Any]], Sequence[numpy.ndarray[Any, numpy.dtype[Any]]]]) – Observation.
step (int) – Current environment step.

Returns

\(\epsilon\)-greedy action.

Return type

numpy.ndarray[Any, numpy.dtype[Any]]