all.presets.atari

a2c([device, discount_factor, last_frame, …])

A2C Atari preset.

c51([device, discount_factor, last_frame, …])

C51 Atari preset.

ddqn([device, discount_factor, last_frame, …])

Dueling Double DQN with Prioritized Experience Replay (PER).

dqn([device, discount_factor, last_frame, …])

DQN Atari preset.

ppo([device, discount_factor, last_frame, …])

PPO Atari preset.

rainbow([device, discount_factor, …])

Rainbow Atari Preset.

vac([device, discount_factor, lr_v, lr_pi, …])

Vanilla Actor-Critic Atari preset.

vpg([device, discount_factor, last_frame, …])

Vanilla Policy Gradient Atari preset.

vqn([device, discount_factor, lr, eps, …])

Vanilla Q-Network Atari preset.

vsarsa([device, discount_factor, lr, eps, …])

Vanilla SARSA Atari preset.

all.presets.atari.a2c(device='cuda', discount_factor=0.99, last_frame=40000000.0, lr=0.0007, eps=0.00015, clip_grad=0.1, entropy_loss_scaling=0.01, value_loss_scaling=0.5, n_envs=16, n_steps=5, feature_model_constructor=<function nature_features>, value_model_constructor=<function nature_value_head>, policy_model_constructor=<function nature_policy_head>)

A2C Atari preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • last_frame (int) – Number of frames to train.

  • lr (float) – Learning rate for the Adam optimizer.

  • eps (float) – Stability parameters for the Adam optimizer.

  • clip_grad (float) – The maximum magnitude of the gradient for any given parameter. Set to 0 to disable.

  • entropy_loss_scaling (float) – Coefficient for the entropy term in the total loss.

  • value_loss_scaling (float) – Coefficient for the value function loss.

  • n_envs (int) – Number of parallel environments.

  • n_steps (int) – Length of each rollout.

  • feature_model_constructor (function) – The function used to construct the neural feature model.

  • value_model_constructor (function) – The function used to construct the neural value model.

  • policy_model_constructor (function) – The function used to construct the neural policy model.

all.presets.atari.c51(device='cuda', discount_factor=0.99, last_frame=40000000.0, lr=0.0001, eps=0.00015, minibatch_size=32, update_frequency=4, target_update_frequency=1000, replay_start_size=80000, replay_buffer_size=1000000, initial_exploration=0.02, final_exploration=0.0, atoms=51, v_min=-10, v_max=10, model_constructor=<function nature_c51>)

C51 Atari preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • last_frame (int) – Number of frames to train.

  • lr (float) – Learning rate for the Adam optimizer.

  • eps (float) – Stability parameters for the Adam optimizer.

  • minibatch_size (int) – Number of experiences to sample in each training update.

  • update_frequency (int) – Number of timesteps per training update.

  • target_update_frequency (int) – Number of timesteps between updates the target network.

  • replay_start_size (int) – Number of experiences in replay buffer when training begins.

  • replay_buffer_size (int) – Maximum number of experiences to store in the replay buffer.

  • initial_exploration (int) – Initial probability of choosing a random action, decayed over course of training.

  • final_exploration (int) – Final probability of choosing a random action.

  • atoms (int) – The number of atoms in the categorical distribution used to represent the distributional value function.

  • v_min (int) – The expected return corresponding to the smallest atom.

  • v_max (int) – The expected return correspodning to the larget atom.

  • model_constructor (function) – The function used to construct the neural model.

all.presets.atari.ddqn(device='cuda', discount_factor=0.99, last_frame=40000000.0, lr=0.0001, eps=0.00015, minibatch_size=32, update_frequency=4, target_update_frequency=1000, replay_start_size=80000, replay_buffer_size=1000000, initial_exploration=1.0, final_exploration=0.01, final_exploration_frame=4000000, alpha=0.5, beta=0.5, model_constructor=<function nature_ddqn>)

Dueling Double DQN with Prioritized Experience Replay (PER).

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • last_frame (int) – Number of frames to train.

  • lr (float) – Learning rate for the Adam optimizer.

  • eps (float) – Stability parameters for the Adam optimizer.

  • minibatch_size (int) – Number of experiences to sample in each training update.

  • update_frequency (int) – Number of timesteps per training update.

  • target_update_frequency (int) – Number of timesteps between updates the target network.

  • replay_start_size (int) – Number of experiences in replay buffer when training begins.

  • replay_buffer_size (int) – Maximum number of experiences to store in the replay buffer.

  • initial_exploration (int) – Initial probability of choosing a random action, decayed until final_exploration_frame.

  • final_exploration (int) – Final probability of choosing a random action.

  • final_exploration_frame (int) – The frame where the exploration decay stops.

  • alpha (float) – Amount of prioritization in the prioritized experience replay buffer. (0 = no prioritization, 1 = full prioritization)

  • beta (float) – The strength of the importance sampling correction for prioritized experience replay. (0 = no correction, 1 = full correction)

  • model_constructor (function) – The function used to construct the neural model.

all.presets.atari.dqn(device='cuda', discount_factor=0.99, last_frame=40000000.0, lr=0.0001, eps=0.00015, minibatch_size=32, update_frequency=4, target_update_frequency=1000, replay_start_size=80000, replay_buffer_size=1000000, initial_exploration=1.0, final_exploration=0.01, final_exploration_frame=4000000, model_constructor=<function nature_dqn>)

DQN Atari preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • last_frame (int) – Number of frames to train.

  • lr (float) – Learning rate for the Adam optimizer.

  • eps (float) – Stability parameters for the Adam optimizer.

  • minibatch_size (int) – Number of experiences to sample in each training update.

  • update_frequency (int) – Number of timesteps per training update.

  • target_update_frequency (int) – Number of timesteps between updates the target network.

  • replay_start_size (int) – Number of experiences in replay buffer when training begins.

  • replay_buffer_size (int) – Maximum number of experiences to store in the replay buffer.

  • initial_exploration (int) – Initial probability of choosing a random action, decayed until final_exploration_frame.

  • final_exploration (int) – Final probability of choosing a random action.

  • final_exploration_frame (int) – The frame where the exploration decay stops.

  • model_constructor (function) – The function used to construct the neural model.

all.presets.atari.ppo(device='cuda', discount_factor=0.99, last_frame=40000000.0, lr=0.00025, eps=1e-05, clip_grad=0.5, entropy_loss_scaling=0.01, value_loss_scaling=0.5, clip_initial=0.1, clip_final=0.01, epochs=4, minibatches=4, n_envs=8, n_steps=128, lam=0.95, feature_model_constructor=<function nature_features>, value_model_constructor=<function nature_value_head>, policy_model_constructor=<function nature_policy_head>)

PPO Atari preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • last_frame (int) – Number of frames to train.

  • lr (float) – Learning rate for the Adam optimizer.

  • eps (float) – Stability parameters for the Adam optimizer.

  • clip_grad (float) – The maximum magnitude of the gradient for any given parameter. Set to 0 to disable.

  • entropy_loss_scaling (float) – Coefficient for the entropy term in the total loss.

  • value_loss_scaling (float) – Coefficient for the value function loss.

  • clip_initial (float) – Value for epsilon in the clipped PPO objective function at the beginning of training.

  • clip_final (float) – Value for epsilon in the clipped PPO objective function at the end of training.

  • epochs (int) – Number of times to iterature through each batch.

  • minibatches (int) – The number of minibatches to split each batch into.

  • n_envs (int) – Number of parallel actors.

  • n_steps (int) – Length of each rollout.

  • lam (float) – The Generalized Advantage Estimate (GAE) decay parameter.

  • feature_model_constructor (function) – The function used to construct the neural feature model.

  • value_model_constructor (function) – The function used to construct the neural value model.

  • policy_model_constructor (function) – The function used to construct the neural policy model.

all.presets.atari.rainbow(device='cuda', discount_factor=0.99, last_frame=40000000.0, lr=0.0001, eps=0.00015, minibatch_size=32, update_frequency=4, target_update_frequency=1000, replay_start_size=80000, replay_buffer_size=1000000, initial_exploration=0.02, final_exploration=0.0, alpha=0.5, beta=0.5, n_steps=3, atoms=51, v_min=-10, v_max=10, sigma=0.5, model_constructor=<function nature_rainbow>)

Rainbow Atari Preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • last_frame (int) – Number of frames to train.

  • lr (float) – Learning rate for the Adam optimizer.

  • eps (float) – Stability parameters for the Adam optimizer.

  • minibatch_size (int) – Number of experiences to sample in each training update.

  • update_frequency (int) – Number of timesteps per training update.

  • target_update_frequency (int) – Number of timesteps between updates the target network.

  • replay_start_size (int) – Number of experiences in replay buffer when training begins.

  • replay_buffer_size (int) – Maximum number of experiences to store in the replay buffer.

  • initial_exploration (int) – Initial probability of choosing a random action, decayed over course of training.

  • final_exploration (int) – Final probability of choosing a random action.

  • alpha (float) – Amount of prioritization in the prioritized experience replay buffer. (0 = no prioritization, 1 = full prioritization)

  • beta (float) – The strength of the importance sampling correction for prioritized experience replay. (0 = no correction, 1 = full correction)

  • n_steps (int) – The number of steps for n-step Q-learning.

  • atoms (int) – The number of atoms in the categorical distribution used to represent the distributional value function.

  • v_min (int) – The expected return corresponding to the smallest atom.

  • v_max (int) – The expected return correspodning to the larget atom.

  • sigma (float) – Initial noisy network noise.

  • model_constructor (function) – The function used to construct the neural model.

all.presets.atari.vac(device='cuda', discount_factor=0.99, lr_v=0.0005, lr_pi=0.0001, eps=0.00015, clip_grad=0.5, value_loss_scaling=0.25, n_envs=16, feature_model_constructor=<function nature_features>, value_model_constructor=<function nature_value_head>, policy_model_constructor=<function nature_policy_head>)

Vanilla Actor-Critic Atari preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • last_frame (int) – Number of frames to train.

  • lr_v (float) – Learning rate for value network.

  • lr_pi (float) – Learning rate for policy network and feature network.

  • eps (float) – Stability parameters for the Adam optimizer.

  • clip_grad (float) – The maximum magnitude of the gradient for any given parameter. Set to 0 to disable.

  • value_loss_scaling (float) – Coefficient for the value function loss.

  • n_envs (int) – Number of parallel environments.

  • feature_model_constructor (function) – The function used to construct the neural feature model.

  • value_model_constructor (function) – The function used to construct the neural value model.

  • policy_model_constructor (function) – The function used to construct the neural policy model.

all.presets.atari.vpg(device='cuda', discount_factor=0.99, last_frame=40000000.0, lr=0.0007, eps=0.00015, clip_grad=0.5, value_loss_scaling=0.25, min_batch_size=1000, feature_model_constructor=<function nature_features>, value_model_constructor=<function nature_value_head>, policy_model_constructor=<function nature_policy_head>)

Vanilla Policy Gradient Atari preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • last_frame (int) – Number of frames to train.

  • lr (float) – Learning rate for the Adam optimizer.

  • eps (float) – Stability parameters for the Adam optimizer.

  • clip_grad (float) – The maximum magnitude of the gradient for any given parameter. Set to 0 to disable.

  • value_loss_scaling (float) – Coefficient for the value function loss.

  • min_batch_size (int) – Continue running complete episodes until at least this many states have been seen since the last update.

  • feature_model_constructor (function) – The function used to construct the neural feature model.

  • value_model_constructor (function) – The function used to construct the neural value model.

  • policy_model_constructor (function) – The function used to construct the neural policy model.

all.presets.atari.vqn(device='cuda', discount_factor=0.99, lr=0.001, eps=0.00015, initial_exploration=1.0, final_exploration=0.02, final_exploration_frame=1000000, n_envs=64, model_constructor=<function nature_ddqn>)

Vanilla Q-Network Atari preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • lr (float) – Learning rate for the Adam optimizer.

  • eps (float) – Stability parameters for the Adam optimizer.

  • initial_exploration (int) – Initial probability of choosing a random action, decayed until final_exploration_frame.

  • final_exploration (int) – Final probability of choosing a random action.

  • final_exploration_frame (int) – The frame where the exploration decay stops.

  • n_envs (int) – Number of parallel environments.

  • model_constructor (function) – The function used to construct the neural model.

all.presets.atari.vsarsa(device='cuda', discount_factor=0.99, lr=0.001, eps=0.00015, final_exploration_frame=1000000, final_exploration=0.02, initial_exploration=1.0, n_envs=64, model_constructor=<function nature_ddqn>)

Vanilla SARSA Atari preset.

Parameters
  • device (str) – The device to load parameters and buffers onto for this agent.

  • discount_factor (float) – Discount factor for future rewards.

  • lr (float) – Learning rate for the Adam optimizer.

  • eps (float) – Stability parameters for the Adam optimizer.

  • initial_exploration (int) – Initial probability of choosing a random action, decayed until final_exploration_frame.

  • final_exploration (int) – Final probability of choosing a random action.

  • final_exploration_frame (int) – The frame where the exploration decay stops.

  • n_envs (int) – Number of parallel environments.

  • model_constructor (function) – The function used to construct the neural model.