all.policies

class all.policies.DeterministicPolicy(model, optimizer, space, name='policy', **kwargs)

Bases: all.approximation.approximation.Approximation

A DDPG-style deterministic policy.

Parameters
  • model (torch.nn.Module) – A Pytorch module representing the policy network. The input shape should be the same as the shape of the state space, and the output shape should be the same as the shape of the action space.

  • optimizer (torch.optim.Optimizer) – A optimizer initialized with the model parameters, e.g. SGD, Adam, RMSprop, etc.

  • action_space (gym.spaces.Box) – The Box representing the action space.

  • kwargs (optional) – Any other arguments accepted by all.approximation.Approximation

class all.policies.GaussianPolicy(model, optimizer, space, name='policy', **kwargs)

Bases: all.approximation.approximation.Approximation

A Gaussian stochastic policy.

This policy will choose actions from a distribution represented by a spherical Gaussian. The first n outputs the model will be squashed to [-1, 1] through a tanh function, and then scaled to the given action_space, and the remaining n outputs will define the amount of noise added.

Parameters
  • model (torch.nn.Module) – A Pytorch module representing the policy network. The input shape should be the same as the shape of the state (or feature) space, and the output shape should be double the size of the the action space. The first n outputs will be the unscaled mean of the action for each dimension, and the second n outputs will be the logarithm of the variance.

  • optimizer (torch.optim.Optimizer) – A optimizer initialized with the model parameters, e.g. SGD, Adam, RMSprop, etc.

  • action_space (gym.spaces.Box) – The Box representing the action space.

  • kwargs (optional) – Any other arguments accepted by all.approximation.Approximation

class all.policies.GreedyPolicy(q, num_actions, epsilon=0.0)

Bases: all.optim.scheduler.Schedulable

An “epsilon-greedy” action selection policy for discrete action spaces.

This policy will usually choose the optimal action according to an approximation of the action value function (the “q-function”), but with probabilty epsilon will choose a random action instead. GreedyPolicy is a Schedulable, meaning that epsilon can be varied over time by passing a Scheduler object.

Parameters
  • q (all.approximation.QNetwork) – The action-value or “q-function”

  • num_actions (int) – The number of available actions.

  • epsilon (float, optional) – The probability of selecting a random action.

eval(state)
no_grad(state)
class all.policies.ParallelGreedyPolicy(q, num_actions, epsilon=0.0)

Bases: all.optim.scheduler.Schedulable

A parallel version of the “epsilon-greedy” action selection policy for discrete action spaces.

This policy will usually choose the optimal action according to an approximation of the action value function (the “q-function”), but with probabilty epsilon will choose a random action instead. GreedyPolicy is a Schedulable, meaning that epsilon can be varied over time by passing a Scheduler object.

Parameters
  • q (all.approximation.QNetwork) – The action-value or “q-function”

  • num_actions (int) – The number of available actions.

  • epsilon (float, optional) – The probability of selecting a random action.

eval(state)
no_grad(state)
class all.policies.SoftDeterministicPolicy(model, optimizer, space, name='policy', **kwargs)

Bases: all.approximation.approximation.Approximation

A “soft” deterministic policy compatible with soft actor-critic (SAC).

Parameters
  • model (torch.nn.Module) – A Pytorch module representing the policy network. The input shape should be the same as the shape of the state (or feature) space, and the output shape should be double the size of the the action space The first n outputs will be the unscaled mean of the action for each dimension, and the second n outputs will be the logarithm of the variance.

  • optimizer (torch.optim.Optimizer) – A optimizer initialized with the model parameters, e.g. SGD, Adam, RMSprop, etc.

  • action_space (gym.spaces.Box) – The Box representing the action space.

  • kwargs (optional) – Any other arguments accepted by all.approximation.Approximation

class all.policies.SoftmaxPolicy(model, optimizer, name='policy', **kwargs)

Bases: all.approximation.approximation.Approximation

A softmax (or Boltzmann) stochastic policy for discrete actions.

Parameters
  • model (torch.nn.Module) – A Pytorch module representing the policy network. The input shape should be the same as the shape of the state (or feature) space, and the output should be a vector the size of the action set.

  • optimizer (torch.optim.Optimizer) – A optimizer initialized with the model parameters, e.g. SGD, Adam, RMSprop, etc.

  • kwargs (optional) – Any other arguments accepted by all.approximation.Approximation