all.policies

class all.policies.DeterministicPolicy(model, optimizer=None, space=None, name='policy', **kwargs)

Bases: Approximation

A DDPG-style deterministic policy.

Parameters:

model (torch.nn.Module) – A Pytorch module representing the policy network. The input shape should be the same as the shape of the state space, and the output shape should be the same as the shape of the action space.
optimizer (torch.optim.Optimizer) – A optimizer initialized with the model parameters, e.g. SGD, Adam, RMSprop, etc.
action_space (gymnasium.spaces.Box) – The Box representing the action space.
kwargs (optional) – Any other arguments accepted by all.approximation.Approximation

class all.policies.GaussianPolicy(model, optimizer=None, space=None, name='policy', **kwargs)

Bases: Approximation

A Gaussian stochastic policy.

This policy will choose actions from a distribution represented by a spherical Gaussian. The first n outputs of the model are the mean of the distribution and the last n outputs are the log variance. The output will be centered and scaled to the size of the given space, but the output will not be clipped. For example, for an output range of [-1, 1], the center is 0 and the scale is 1.

Parameters:

model (torch.nn.Module) – A Pytorch module representing the policy network. The input shape should be the same as the shape of the state (or feature) space, and the output shape should be double the size of the the action space. The first n outputs will be the unscaled mean of the action for each dimension, and the last n outputs will be the logarithm of the variance.
optimizer (torch.optim.Optimizer) – A optimizer initialized with the model parameters, e.g. SGD, Adam, RMSprop, etc.
action_space (gymnasium.spaces.Box) – The Box representing the action space.
kwargs (optional) – Any other arguments accepted by all.approximation.Approximation

class all.policies.GreedyPolicy(q, num_actions, epsilon=0.0)

Bases: Schedulable

An “epsilon-greedy” action selection policy for discrete action spaces.

This policy will usually choose the optimal action according to an approximation of the action value function (the “q-function”), but with probability epsilon will choose a random action instead. GreedyPolicy is a Schedulable, meaning that epsilon can be varied over time by passing a Scheduler object.

Parameters:

q (all.approximation.QNetwork) – The action-value or “q-function”
num_actions (int) – The number of available actions.
epsilon (float, optional) – The probability of selecting a random action.

eval(state)

no_grad(state)

class all.policies.ParallelGreedyPolicy(q, num_actions, epsilon=0.0)

Bases: Schedulable

A parallel version of the “epsilon-greedy” action selection policy for discrete action spaces.

This policy will usually choose the optimal action according to an approximation of the action value function (the “q-function”), but with probability epsilon will choose a random action instead. GreedyPolicy is a Schedulable, meaning that epsilon can be varied over time by passing a Scheduler object.

Parameters:

q (all.approximation.QNetwork) – The action-value or “q-function”
num_actions (int) – The number of available actions.
epsilon (float, optional) – The probability of selecting a random action.

eval(state)

no_grad(state)

class all.policies.SoftDeterministicPolicy(model, optimizer=None, space=None, name='policy', log_std_min=-20, log_std_max=4, **kwargs)

Bases: Approximation

A “soft” deterministic policy compatible with soft actor-critic (SAC).

Parameters:

model (torch.nn.Module) – A Pytorch module representing the policy network. The input shape should be the same as the shape of the state (or feature) space, and the output shape should be double the size of the the action space The first n outputs will be the unscaled mean of the action for each dimension, and the second n outputs will be the logarithm of the variance.
optimizer (torch.optim.Optimizer) – A optimizer initialized with the model parameters, e.g. SGD, Adam, RMSprop, etc.
action_space (gymnasium.spaces.Box) – The Box representing the action space.
kwargs (optional) – Any other arguments accepted by all.approximation.Approximation

class all.policies.SoftmaxPolicy(model, optimizer=None, name='policy', **kwargs)

Bases: Approximation

A softmax (or Boltzmann) stochastic policy for discrete actions.

Parameters:

model (torch.nn.Module) – A Pytorch module representing the policy network. The input shape should be the same as the shape of the state (or feature) space, and the output should be a vector the size of the action set.
optimizer (torch.optim.Optimizer) – A optimizer initialized with the model parameters, e.g. SGD, Adam, RMSprop, etc.
kwargs (optional) – Any other arguments accepted by all.approximation.Approximation