Tutorials
References
Other
Multi-step transition picker.
This class implements transition picking for the multi-step TD error. reward is computed as a multi-step discounted return.
reward
n_steps (int) – Delta timestep between observation and net_observation.
observation
net_observation
gamma (float) – Discount factor to compute a multi-step return.
Methods
Returns transition specified by index.
index
episode (EpisodeBase) – Episode.
index (int) – Index at the target transition.
Transition.
Transition