d3rlpy.online.iterators.train

d3rlpy.online.iterators.train(env, algo, buffer, explorer=None, n_steps_per_epoch=4000, n_updates_per_epoch=100, eval_env=None, eval_epsilon=0.05, experiment_name=None, with_timestamp=True, logdir='d3rlpy_logs', verbose=True, show_progress=True, tensorboard=True, save_interval=1)[source]

Start training loop of online deep reinforcement learning.

Parameters:
  • env (gym.Env) – gym-like environment.
  • algo (d3rlpy.algos.base.AlgoBase) – algorithm.
  • buffer (d3rlpy.online.buffers.Buffer) – replay buffer.
  • explorer (d3rlpy.online.explorers.Explorer) – action explorer.
  • n_steps_per_epoch (int) – the number of steps per epoch.
  • n_updates_per_epoch (int) – the number of updates per epoch.
  • eval_env (gym.Env) – gym-like environment. If None, evaluation is skipped.
  • eval_epsilon (float) – \(\epsilon\)-greedy factor during evaluation.
  • experiment_name (str) – experiment name for logging. If not passed, the directory name will be {class name}_online_{timestamp}.
  • with_timestamp (bool) – flag to add timestamp string to the last of directory name.
  • logdir (str) – root directory name to save logs.
  • verbose (bool) – flag to show logged information on stdout.
  • show_progress (bool) – flag to show progress bar for iterations.
  • tensorboard (bool) – flag to save logged information in tensorboard (additional to the csv data)
  • save_interval (int) – interval to save parameters.