Algorithms

d3rlpy provides state-of-the-art offline deep reinforcement learning algorithms as well as online algorithms for the base implementations.

Continuous control algorithms

d3rlpy.algos.BC

Behavior Cloning algorithm.

d3rlpy.algos.DDPG

Deep Deterministic Policy Gradients algorithm.

d3rlpy.algos.TD3

Twin Delayed Deep Deterministic Policy Gradients algorithm.

d3rlpy.algos.SAC

Soft Actor-Critic algorithm.

d3rlpy.algos.BCQ

Batch-Constrained Q-learning algorithm.

d3rlpy.algos.BEAR

Bootstrapping Error Accumulation Reduction algorithm.

d3rlpy.algos.CRR

Critic Reguralized Regression algorithm.

d3rlpy.algos.CQL

Conservative Q-Learning algorithm.

d3rlpy.algos.AWR

Advantage-Weighted Regression algorithm.

d3rlpy.algos.AWAC

Advantage Weighted Actor-Critic algorithm.

d3rlpy.algos.PLAS

Policy in Latent Action Space algorithm.

d3rlpy.algos.PLASWithPerturbation

Policy in Latent Action Space algorithm with perturbation layer.

d3rlpy.algos.TD3PlusBC

TD3+BC algorithm.

d3rlpy.algos.MOPO

Model-based Offline Policy Optimization.

d3rlpy.algos.COMBO

Conservative Offline Model-Based Optimization.

d3rlpy.algos.RandomPolicy

Random Policy for continuous control algorithm.

Discrete control algorithms

d3rlpy.algos.DiscreteBC

Behavior Cloning algorithm for discrete control.

d3rlpy.algos.DQN

Deep Q-Network algorithm.

d3rlpy.algos.DoubleDQN

Double Deep Q-Network algorithm.

d3rlpy.algos.DiscreteSAC

Soft Actor-Critic algorithm for discrete action-space.

d3rlpy.algos.DiscreteBCQ

Discrete version of Batch-Constrained Q-learning algorithm.

d3rlpy.algos.DiscreteCQL

Discrete version of Conservative Q-Learning algorithm.

d3rlpy.algos.DiscreteAWR

Discrete veriosn of Advantage-Weighted Regression algorithm.

d3rlpy.algos.DiscreteRandomPolicy

Random Policy for discrete control algorithm.