all.bodies

class all.bodies.Body(agent)

Bases: all.agents._agent.Agent

A Body wraps a reinforcment learning Agent, altering its inputs and ouputs.

The Body API is identical to the Agent API from the perspective of the rest of the system. This base class is provided only for semantic clarity.

act(state, reward)

Select an action for the current timestep and update internal parameters.

In general, a reinforcement learning agent does several things during a timestep: 1. Choose an action, 2. Compute the TD error from the previous time step 3. Update the value function and/or policy The order of these steps differs depending on the agent. This method allows the agent to do whatever is necessary for itself on a given timestep. However, the agent must ultimately return an action.

Parameters
  • state (all.environment.State) – The environment state at the current timestep.

  • reward (torch.Tensor) – The reward from the previous timestep.

Returns

The action to take at the current timestep.

Return type

torch.Tensor

property agent
eval(state, reward)

Select an action for the current timestep in evaluation mode.

Unlike act, this method should NOT update the internal parameters of the agent. Most of the time, this method should return the greedy action according to the current policy. This method is useful when using evaluation methodologies that distinguish between the performance of the agent during training and the performance of the resulting policy.

Parameters
  • state (all.environment.State) – The environment state at the current timestep.

  • reward (torch.Tensor) – The reward from the previous timestep.

Returns

The action to take at the current timestep.

Return type

torch.Tensor

class all.bodies.ClipRewards(agent)

Bases: all.bodies._body.Body

act(state, reward)

Select an action for the current timestep and update internal parameters.

In general, a reinforcement learning agent does several things during a timestep: 1. Choose an action, 2. Compute the TD error from the previous time step 3. Update the value function and/or policy The order of these steps differs depending on the agent. This method allows the agent to do whatever is necessary for itself on a given timestep. However, the agent must ultimately return an action.

Parameters
  • state (all.environment.State) – The environment state at the current timestep.

  • reward (torch.Tensor) – The reward from the previous timestep.

Returns

The action to take at the current timestep.

Return type

torch.Tensor

eval(state, reward)

Select an action for the current timestep in evaluation mode.

Unlike act, this method should NOT update the internal parameters of the agent. Most of the time, this method should return the greedy action according to the current policy. This method is useful when using evaluation methodologies that distinguish between the performance of the agent during training and the performance of the resulting policy.

Parameters
  • state (all.environment.State) – The environment state at the current timestep.

  • reward (torch.Tensor) – The reward from the previous timestep.

Returns

The action to take at the current timestep.

Return type

torch.Tensor

class all.bodies.DeepmindAtariBody(agent, lazy_frames=False, episodic_lives=True, frame_stack=4)

Bases: all.bodies._body.Body

class all.bodies.FrameStack(agent, size=4, lazy=False)

Bases: all.bodies._body.Body

act(state, reward)

Select an action for the current timestep and update internal parameters.

In general, a reinforcement learning agent does several things during a timestep: 1. Choose an action, 2. Compute the TD error from the previous time step 3. Update the value function and/or policy The order of these steps differs depending on the agent. This method allows the agent to do whatever is necessary for itself on a given timestep. However, the agent must ultimately return an action.

Parameters
  • state (all.environment.State) – The environment state at the current timestep.

  • reward (torch.Tensor) – The reward from the previous timestep.

Returns

The action to take at the current timestep.

Return type

torch.Tensor

eval(state, reward)

Select an action for the current timestep in evaluation mode.

Unlike act, this method should NOT update the internal parameters of the agent. Most of the time, this method should return the greedy action according to the current policy. This method is useful when using evaluation methodologies that distinguish between the performance of the agent during training and the performance of the resulting policy.

Parameters
  • state (all.environment.State) – The environment state at the current timestep.

  • reward (torch.Tensor) – The reward from the previous timestep.

Returns

The action to take at the current timestep.

Return type

torch.Tensor

class all.bodies.TimeFeature(agent, scale=0.001)

Bases: all.bodies._body.Body

act(state, reward)

Select an action for the current timestep and update internal parameters.

In general, a reinforcement learning agent does several things during a timestep: 1. Choose an action, 2. Compute the TD error from the previous time step 3. Update the value function and/or policy The order of these steps differs depending on the agent. This method allows the agent to do whatever is necessary for itself on a given timestep. However, the agent must ultimately return an action.

Parameters
  • state (all.environment.State) – The environment state at the current timestep.

  • reward (torch.Tensor) – The reward from the previous timestep.

Returns

The action to take at the current timestep.

Return type

torch.Tensor

eval(state, reward)

Select an action for the current timestep in evaluation mode.

Unlike act, this method should NOT update the internal parameters of the agent. Most of the time, this method should return the greedy action according to the current policy. This method is useful when using evaluation methodologies that distinguish between the performance of the agent during training and the performance of the resulting policy.

Parameters
  • state (all.environment.State) – The environment state at the current timestep.

  • reward (torch.Tensor) – The reward from the previous timestep.

Returns

The action to take at the current timestep.

Return type

torch.Tensor