all.presets.classic_control

a2c([device, discount_factor, lr, …])

A2C classic control preset.

c51([device, discount_factor, lr, …])

C51 classic control preset.

ddqn([device, discount_factor, lr, …])

Dueling Double DQN with Prioritized Experience Replay (PER).

dqn([device, discount_factor, lr, …])

DQN classic control preset.

ppo([device, discount_factor, lr, …])

PPO classic control preset.

rainbow([device, discount_factor, lr, …])

Rainbow classic control preset.

vac([device, discount_factor, lr_v, lr_pi, …])

Vanilla Actor-Critic classic control preset.

vpg([device, discount_factor, lr, …])

Vanilla Policy Gradient classic control preset.

vqn([device, discount_factor, lr, eps, …])

Vanilla Q-Network classic control preset.

vsarsa([device, discount_factor, lr, eps, …])

Vanilla SARSA classic control preset.

all.presets.classic_control.a2c(device='cpu', discount_factor=0.99, lr=0.003, clip_grad=0.1, entropy_loss_scaling=0.001, n_envs=4, n_steps=32, feature_model_constructor=<function fc_relu_features>, value_model_constructor=<function fc_value_head>, policy_model_constructor=<function fc_policy_head>)

A2C classic control preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • lr (float) – Learning rate for the Adam optimizer.

  • clip_grad (float) – The maximum magnitude of the gradient for any given parameter. Set to 0 to disable.

  • entropy_loss_scaling (float) – Coefficient for the entropy term in the total loss.

  • n_envs (int) – Number of parallel environments.

  • n_steps (int) – Length of each rollout.

  • feature_model_constructor (function) – The function used to construct the neural feature model.

  • value_model_constructor (function) – The function used to construct the neural value model.

  • policy_model_constructor (function) – The function used to construct the neural policy model.

all.presets.classic_control.c51(device='cpu', discount_factor=0.99, lr=0.0001, minibatch_size=128, update_frequency=1, replay_start_size=1000, replay_buffer_size=20000, initial_exploration=1.0, final_exploration=0.02, final_exploration_frame=10000, atoms=101, v_min=-100, v_max=100, model_constructor=<function fc_relu_dist_q>)

C51 classic control preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • last_frame (int) – Number of frames to train.

  • lr (float) – Learning rate for the Adam optimizer.

  • minibatch_size (int) – Number of experiences to sample in each training update.

  • update_frequency (int) – Number of timesteps per training update.

  • replay_start_size (int) – Number of experiences in replay buffer when training begins.

  • replay_buffer_size (int) – Maximum number of experiences to store in the replay buffer.

  • initial_exploration (int) – Initial probability of choosing a random action, decayed over course of training.

  • final_exploration (int) – Final probability of choosing a random action.

  • atoms (int) – The number of atoms in the categorical distribution used to represent the distributional value function.

  • v_min (int) – The expected return corresponding to the smallest atom.

  • v_max (int) – The expected return correspodning to the larget atom.

  • model_constructor (function) – The function used to construct the neural model.

all.presets.classic_control.ddqn(device='cpu', discount_factor=0.99, lr=0.001, minibatch_size=64, update_frequency=1, target_update_frequency=100, replay_start_size=1000, replay_buffer_size=10000, initial_exploration=1.0, final_exploration=0.0, final_exploration_frame=10000, alpha=0.2, beta=0.6, model_constructor=<function dueling_fc_relu_q>)

Dueling Double DQN with Prioritized Experience Replay (PER).

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • last_frame (int) – Number of frames to train.

  • lr (float) – Learning rate for the Adam optimizer.

  • minibatch_size (int) – Number of experiences to sample in each training update.

  • update_frequency (int) – Number of timesteps per training update.

  • target_update_frequency (int) – Number of timesteps between updates the target network.

  • replay_start_size (int) – Number of experiences in replay buffer when training begins.

  • replay_buffer_size (int) – Maximum number of experiences to store in the replay buffer.

  • initial_exploration (int) – Initial probability of choosing a random action, decayed until final_exploration_frame.

  • final_exploration (int) – Final probability of choosing a random action.

  • final_exploration_frame (int) – The frame where the exploration decay stops.

  • alpha (float) – Amount of prioritization in the prioritized experience replay buffer. (0 = no prioritization, 1 = full prioritization)

  • beta (float) – The strength of the importance sampling correction for prioritized experience replay. (0 = no correction, 1 = full correction)

  • model_constructor (function) – The function used to construct the neural model.

all.presets.classic_control.dqn(device='cpu', discount_factor=0.99, lr=0.001, minibatch_size=64, update_frequency=1, target_update_frequency=100, replay_start_size=1000, replay_buffer_size=10000, initial_exploration=1.0, final_exploration=0.0, final_exploration_frame=10000, model_constructor=<function fc_relu_q>)

DQN classic control preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • lr (float) – Learning rate for the Adam optimizer.

  • minibatch_size (int) – Number of experiences to sample in each training update.

  • update_frequency (int) – Number of timesteps per training update.

  • target_update_frequency (int) – Number of timesteps between updates the target network.

  • replay_start_size (int) – Number of experiences in replay buffer when training begins.

  • replay_buffer_size (int) – Maximum number of experiences to store in the replay buffer.

  • initial_exploration (int) – Initial probability of choosing a random action, decayed until final_exploration_frame.

  • final_exploration (int) – Final probability of choosing a random action.

  • final_exploration_frame (int) – The frame where the exploration decay stops.

  • model_constructor (function) – The function used to construct the neural model.

all.presets.classic_control.ppo(device='cpu', discount_factor=0.99, lr=0.001, clip_grad=0.1, entropy_loss_scaling=0.001, epsilon=0.2, epochs=4, minibatches=4, n_envs=8, n_steps=8, lam=0.95, feature_model_constructor=<function fc_relu_features>, value_model_constructor=<function fc_value_head>, policy_model_constructor=<function fc_policy_head>)

PPO classic control preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • lr (float) – Learning rate for the Adam optimizer.

  • clip_grad (float) – The maximum magnitude of the gradient for any given parameter. Set to 0 to disable.

  • entropy_loss_scaling (float) – Coefficient for the entropy term in the total loss.

  • epsilon (float) – Value for epsilon in the clipped PPO objective function.

  • epochs (int) – Number of times to iterature through each batch.

  • minibatches (int) – The number of minibatches to split each batch into.

  • n_envs (int) – Number of parallel actors.

  • n_steps (int) – Length of each rollout.

  • lam (float) – The Generalized Advantage Estimate (GAE) decay parameter.

  • feature_model_constructor (function) – The function used to construct the neural feature model.

  • value_model_constructor (function) – The function used to construct the neural value model.

  • policy_model_constructor (function) – The function used to construct the neural policy model.

all.presets.classic_control.rainbow(device='cpu', discount_factor=0.99, lr=0.0002, minibatch_size=64, update_frequency=1, replay_buffer_size=20000, replay_start_size=1000, alpha=0.5, beta=0.5, n_steps=5, atoms=101, v_min=-100, v_max=100, sigma=0.5, model_constructor=<function fc_relu_rainbow>)

Rainbow classic control preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • lr (float) – Learning rate for the Adam optimizer.

  • minibatch_size (int) – Number of experiences to sample in each training update.

  • update_frequency (int) – Number of timesteps per training update.

  • replay_start_size (int) – Number of experiences in replay buffer when training begins.

  • replay_buffer_size (int) – Maximum number of experiences to store in the replay buffer.

  • alpha (float) – Amount of prioritization in the prioritized experience replay buffer. (0 = no prioritization, 1 = full prioritization)

  • beta (float) – The strength of the importance sampling correction for prioritized experience replay. (0 = no correction, 1 = full correction)

  • n_steps (int) – The number of steps for n-step Q-learning.

  • atoms (int) – The number of atoms in the categorical distribution used to represent the distributional value function.

  • v_min (int) – The expected return corresponding to the smallest atom.

  • v_max (int) – The expected return correspodning to the larget atom.

  • sigma (float) – Initial noisy network noise.

  • model_constructor (function) – The function used to construct the neural model.

all.presets.classic_control.vac(device='cpu', discount_factor=0.99, lr_v=0.005, lr_pi=0.001, eps=1e-05, feature_model_constructor=<function fc_relu_features>, value_model_constructor=<function fc_value_head>, policy_model_constructor=<function fc_policy_head>)

Vanilla Actor-Critic classic control preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • lr_v (float) – Learning rate for value network.

  • lr_pi (float) – Learning rate for policy network and feature network.

  • eps (float) – Stability parameters for the Adam optimizer.

  • feature_model_constructor (function) – The function used to construct the neural feature model.

  • value_model_constructor (function) – The function used to construct the neural value model.

  • policy_model_constructor (function) – The function used to construct the neural policy model.

all.presets.classic_control.vpg(device='cpu', discount_factor=0.99, lr=0.005, min_batch_size=500, feature_model_constructor=<function fc_relu_features>, value_model_constructor=<function fc_value_head>, policy_model_constructor=<function fc_policy_head>)

Vanilla Policy Gradient classic control preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • last_frame (int) – Number of frames to train.

  • lr (float) – Learning rate for the Adam optimizer.

  • min_batch_size (int) – Continue running complete episodes until at least this many states have been seen since the last update.

  • feature_model_constructor (function) – The function used to construct the neural feature model.

  • value_model_constructor (function) – The function used to construct the neural value model.

  • policy_model_constructor (function) – The function used to construct the neural policy model.

all.presets.classic_control.vqn(device='cpu', discount_factor=0.99, lr=0.01, eps=1e-05, epsilon=0.1, n_envs=8, model_constructor=<function fc_relu_q>)

Vanilla Q-Network classic control preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • lr (float) – Learning rate for the Adam optimizer.

  • eps (float) – Stability parameters for the Adam optimizer.

  • epsilon (int) – Probability of choosing a random action.

  • n_envs (int) – Number of parallel environments.

  • model_constructor (function) – The function used to construct the neural model.

all.presets.classic_control.vsarsa(device='cpu', discount_factor=0.99, lr=0.01, eps=1e-05, epsilon=0.1, n_envs=8, model_constructor=<function fc_relu_q>)

Vanilla SARSA classic control preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • lr (float) – Learning rate for the Adam optimizer.

  • eps (float) – Stability parameters for the Adam optimizer.

  • epsilon (int) – Probability of choosing a random action.

  • n_envs (int) – Number of parallel environments.

  • model_constructor (function) – The function used to construct the neural model.