all.experiments

class all.experiments.Experiment(writer, quiet)

Bases: abc.ABC

An Experiment manages the basic train/test loop and logs results.

Parameters
  • ( (writer) – torch.logging.writer:): A Writer object used for logging.

  • quiet (bool) – If False, the Experiment will print information about episode returns to standard out.

abstract property episode

The index of the current training episode

abstract property frame

The index of the current training frame.

abstract test(episodes=100)

Test the agent in eval mode for a certain number of episodes.

Parameters

episodes (int) – The number of test epsiodes.

Returns

A list of all returns received during testing.

Return type

list(float)

abstract train(frames=inf, episodes=inf)

Train the agent for a certain number of frames or episodes. If both frames and episodes are specified, then the training loop will exit when either condition is satisfied.

Parameters
  • frames (int) – The maximum number of training frames.

  • episodes (bool) – The maximum number of training episodes.

class all.experiments.ExperimentWriter(experiment, agent_name, env_name, loss=True)

Bases: tensorboardX.writer.SummaryWriter, all.logging.Writer

The Writer object used by all.experiments.Experiment. Writes logs using tensorboard into the current runs directory, tagging the run with a combination of the agent name, the commit hash of the current git repo of the working directory (if any), and the current time. Also writes summary statistics into CSV files. :param experiment: The Experiment associated with the Writer object. :type experiment: all.experiments.Experiment :param agent_name: The name of the Agent the Experiment is being performed on :type agent_name: str :param env_name: The name of the environment the Experiment is being performed in :type env_name: str :param loss: Whether or not to log loss/scheduling metrics, or only evaluation and summary metrics. :type loss: bool, optional

add_evaluation(name, value, step='frame')

Log the evaluation metric.

Parameters
  • name (str) – The tag to associate with the loss

  • value (number) – The evaluation metric at the current step

  • step (str, optional) – Which step to use (e.g., “frame” or “episode”)

add_loss(name, value, step='frame')

Log the given loss metric at the current step.

Parameters
  • name (str) – The tag to associate with the loss

  • value (number) – The value of the loss at the current step

  • step (str, optional) – Which step to use (e.g., “frame” or “episode”)

add_scalar(name, value, step='frame')

Log an arbitrary scalar. :param name: The tag to associate with the scalar :type name: str :param value: The value of the scalar at the current step :type value: number :param step: Which step to use (e.g., “frame” or “episode”) :type step: str, optional

add_schedule(name, value, step='frame')

Log the current value of a hyperparameter according to some schedule.

Parameters
  • name (str) – The tag to associate with the hyperparameter schedule

  • value (number) – The value of the hyperparameter at the current step

  • step (str, optional) – Which step to use (e.g., “frame” or “episode”)

add_summary(name, mean, std, step='frame')

Log a summary statistic.

Parameters
  • name (str) – The tag to associate with the summary statistic

  • mean (float) – The mean of the statistic at the current step

  • std (float) – The standard deviation of the statistic at the current step

  • step (str, optional) – Which step to use (e.g., “frame” or “episode”)

class all.experiments.GreedyAgent(action_space, feature=None, q=None, policy=None)

Bases: all.agents._agent.Agent

act(state, _)

Select an action for the current timestep and update internal parameters.

In general, a reinforcement learning agent does several things during a timestep: 1. Choose an action, 2. Compute the TD error from the previous time step 3. Update the value function and/or policy The order of these steps differs depending on the agent. This method allows the agent to do whatever is necessary for itself on a given timestep. However, the agent must ultimately return an action.

Parameters

state (all.environment.State) – The environment state at the current timestep.

Returns

The action to take at the current timestep.

Return type

torch.Tensor

choose_continuous(state)
choose_discrete(state)
eval(state, reward)

Select an action for the current timestep in evaluation mode.

Unlike act, this method should NOT update the internal parameters of the agent. Most of the time, this method should return the greedy action according to the current policy. This method is useful when using evaluation methodologies that distinguish between the performance of the agent during training and the performance of the resulting policy.

Parameters

state (all.environment.State) – The environment state at the current timestep.

Returns

The action to take at the current timestep.

Return type

torch.Tensor

static load(dirname, env)
class all.experiments.ParallelEnvExperiment(agent, env, render=False, quiet=False, write_loss=True)

Bases: all.experiments.experiment.Experiment

An Experiment object for training and testing agents that use parallel training environments.

property episode

The index of the current training episode

property frame

The index of the current training frame.

test(episodes=100)

Test the agent in eval mode for a certain number of episodes.

Parameters

episodes (int) – The number of test epsiodes.

Returns

A list of all returns received during testing.

Return type

list(float)

train(frames=inf, episodes=inf)

Train the agent for a certain number of frames or episodes. If both frames and episodes are specified, then the training loop will exit when either condition is satisfied.

Parameters
  • frames (int) – The maximum number of training frames.

  • episodes (bool) – The maximum number of training episodes.

class all.experiments.SingleEnvExperiment(agent, env, render=False, quiet=False, write_loss=True)

Bases: all.experiments.experiment.Experiment

An Experiment object for training and testing agents that interact with one environment at a time.

property episode

The index of the current training episode

property frame

The index of the current training frame.

test(episodes=100)

Test the agent in eval mode for a certain number of episodes.

Parameters

episodes (int) – The number of test epsiodes.

Returns

A list of all returns received during testing.

Return type

list(float)

train(frames=inf, episodes=inf)

Train the agent for a certain number of frames or episodes. If both frames and episodes are specified, then the training loop will exit when either condition is satisfied.

Parameters
  • frames (int) – The maximum number of training frames.

  • episodes (bool) – The maximum number of training episodes.

class all.experiments.SlurmExperiment(agents, envs, frames, test_episodes=100, job_name='autonomous-learning-library', sbatch_args=None)

Bases: object

create_sbatch_script()
make_output_directory()
parse_args()
queue_jobs()
run_experiment()
run_sbatch_script()
all.experiments.load_and_watch(dir, env, fps=60)
all.experiments.run_experiment(agents, envs, frames, test_episodes=100, render=False, quiet=False, write_loss=True)
all.experiments.watch(agent, env, fps=60)