all.environments

class all.environments.AtariEnvironment(name, *args, **kwargs)

Bases: all.environments.gym.GymEnvironment

duplicate(n)

Create n copies of this environment.

property name

The name of the environment.

class all.environments.Environment

Bases: abc.ABC

A reinforcement learning Environment.

In reinforcement learning, an Agent learns by interacting with an Environment. An Environment defines the dynamics of a particular problem: the states, the actions, the transitions between states, and the rewards given to the agent. Environments are often used to benchmark reinforcement learning agents, or to define real problems that the user hopes to solve using reinforcement learning.

abstract property action

The most recent Action taken

abstract property action_space

The Space representing the range of possible actions.

Returns

An object of type Space that represents possible actions the agent may take

Return type

Space

abstract close()

Clean up any extraneaous environment objects.

abstract property device

The torch device the environment lives on.

abstract property done

Whether or not the environment has terminated and should be reset.

abstract duplicate(n)

Create n copies of this environment.

property info

Debugging info for the current time step.

abstract property name

The name of the environment.

property observation_space

Alias for Environemnt.state_space.

Returns

An object of type Space that represents possible states the agent may observe

Return type

Space

abstract render(**kwargs)

Render the current environment state.

abstract reset()

Reset the environment and return a new intial state.

Returns

The initial state for the next episode.

Return type

State

abstract property reward

The reward for the previous action taken

property should_reset

Special property to determine whether the runner should call reset. Related to done, except in some environments, it helps to distinguish between what the algorithm considers an episode, and what the runner considers an episode. For example, in Pong, it is easier if the agent treats a single volley as an episode. However, we would still like to evaluate the agent relative to the entire match.

abstract property state

The State of the Environment at the current timestep.

abstract property state_space

The Space representing the range of observable states.

Returns

An object of type Space that represents possible states the agent may observe

Return type

Space

abstract step(action)

Apply an action and get the next state.

Parameters

action (Action) – The action to apply at the current time step.

Returns

  • all.environments.State – The State of the environment after the action is applied. This State object includes both the done flag and any additional “info”

  • float – The reward achieved by the previous action

class all.environments.GymEnvironment(env, device=torch.device)

Bases: all.environments.abstract.Environment

property action

The most recent Action taken

property action_space

The Space representing the range of possible actions.

Returns

An object of type Space that represents possible actions the agent may take

Return type

Space

close()

Clean up any extraneaous environment objects.

property device

The torch device the environment lives on.

property done

Whether or not the environment has terminated and should be reset.

duplicate(n)

Create n copies of this environment.

property env
property info

Debugging info for the current time step.

property name

The name of the environment.

render(**kwargs)

Render the current environment state.

reset()

Reset the environment and return a new intial state.

Returns

The initial state for the next episode.

Return type

State

property reward

The reward for the previous action taken

seed(seed)
property state

The State of the Environment at the current timestep.

property state_space

The Space representing the range of observable states.

Returns

An object of type Space that represents possible states the agent may observe

Return type

Space

step(action)

Apply an action and get the next state.

Parameters

action (Action) – The action to apply at the current time step.

Returns

  • all.environments.State – The State of the environment after the action is applied. This State object includes both the done flag and any additional “info”

  • float – The reward achieved by the previous action

class all.environments.State(raw, mask=None, info=None)

Bases: object

property done
property features

Default features are the raw state. Override this method for other types of features.

classmethod from_gym(numpy_arr, done, info, device='cpu', dtype=<class 'numpy.float32'>)
classmethod from_list(states)
property info
property mask
property raw