all.policies¶

class
all.policies.
DeterministicPolicy
(model, optimizer, space, name='policy', **kwargs)¶ Bases:
all.approximation.approximation.Approximation
A DDPGstyle deterministic policy.
 Parameters
model (torch.nn.Module) – A Pytorch module representing the policy network. The input shape should be the same as the shape of the state space, and the output shape should be the same as the shape of the action space.
optimizer (torch.optim.Optimizer) – A optimizer initialized with the model parameters, e.g. SGD, Adam, RMSprop, etc.
action_space (gym.spaces.Box) – The Box representing the action space.
kwargs (optional) – Any other arguments accepted by all.approximation.Approximation

class
all.policies.
GaussianPolicy
(model, optimizer, space, name='policy', **kwargs)¶ Bases:
all.approximation.approximation.Approximation
A Gaussian stochastic policy.
This policy will choose actions from a distribution represented by a spherical Gaussian. The first n outputs the model will be squashed to [1, 1] through a tanh function, and then scaled to the given action_space, and the remaining n outputs will define the amount of noise added.
 Parameters
model (torch.nn.Module) – A Pytorch module representing the policy network. The input shape should be the same as the shape of the state (or feature) space, and the output shape should be double the size of the the action space. The first n outputs will be the unscaled mean of the action for each dimension, and the second n outputs will be the logarithm of the variance.
optimizer (torch.optim.Optimizer) – A optimizer initialized with the model parameters, e.g. SGD, Adam, RMSprop, etc.
action_space (gym.spaces.Box) – The Box representing the action space.
kwargs (optional) – Any other arguments accepted by all.approximation.Approximation

class
all.policies.
GreedyPolicy
(q, num_actions, epsilon=0.0)¶ Bases:
all.optim.scheduler.Schedulable
An “epsilongreedy” action selection policy for discrete action spaces.
This policy will usually choose the optimal action according to an approximation of the action value function (the “qfunction”), but with probabilty epsilon will choose a random action instead. GreedyPolicy is a Schedulable, meaning that epsilon can be varied over time by passing a Scheduler object.
 Parameters
q (all.approximation.QNetwork) – The actionvalue or “qfunction”
num_actions (int) – The number of available actions.
epsilon (float, optional) – The probability of selecting a random action.

eval
(state)¶

no_grad
(state)¶

class
all.policies.
ParallelGreedyPolicy
(q, num_actions, epsilon=0.0)¶ Bases:
all.optim.scheduler.Schedulable
A parallel version of the “epsilongreedy” action selection policy for discrete action spaces.
This policy will usually choose the optimal action according to an approximation of the action value function (the “qfunction”), but with probabilty epsilon will choose a random action instead. GreedyPolicy is a Schedulable, meaning that epsilon can be varied over time by passing a Scheduler object.
 Parameters
q (all.approximation.QNetwork) – The actionvalue or “qfunction”
num_actions (int) – The number of available actions.
epsilon (float, optional) – The probability of selecting a random action.

eval
(state)¶

no_grad
(state)¶

class
all.policies.
SoftDeterministicPolicy
(model, optimizer, space, name='policy', **kwargs)¶ Bases:
all.approximation.approximation.Approximation
A “soft” deterministic policy compatible with soft actorcritic (SAC).
 Parameters
model (torch.nn.Module) – A Pytorch module representing the policy network. The input shape should be the same as the shape of the state (or feature) space, and the output shape should be double the size of the the action space The first n outputs will be the unscaled mean of the action for each dimension, and the second n outputs will be the logarithm of the variance.
optimizer (torch.optim.Optimizer) – A optimizer initialized with the model parameters, e.g. SGD, Adam, RMSprop, etc.
action_space (gym.spaces.Box) – The Box representing the action space.
kwargs (optional) – Any other arguments accepted by all.approximation.Approximation

class
all.policies.
SoftmaxPolicy
(model, optimizer, name='policy', **kwargs)¶ Bases:
all.approximation.approximation.Approximation
A softmax (or Boltzmann) stochastic policy for discrete actions.
 Parameters
model (torch.nn.Module) – A Pytorch module representing the policy network. The input shape should be the same as the shape of the state (or feature) space, and the output should be a vector the size of the action set.
optimizer (torch.optim.Optimizer) – A optimizer initialized with the model parameters, e.g. SGD, Adam, RMSprop, etc.
kwargs (optional) – Any other arguments accepted by all.approximation.Approximation