Experiment Run Definition Documents#
Introduction#
Training and test of agents on one or more environments, i.e., the execution of palaestrAI, is controlled from a single file: the Experiment Run document. This document defines the following major components of a run:
Run metadata (name, seed, etc.)
Environments and rewards
Agents and their objectives
Termination conditions
An experiment run document defines a single, reproducible execution of palaestrAI. In order to train agents and test them, this is the only document an experimenter will need. Writing actual code is only necessary if new environments or agents are to be used.
The experiment run document is a YAML file that contains global options as well as the definition of a schedule of execution.
Global Options#
A small number of mandatory entries are specified on global level. They are:
uid
(string): A unique, user-specified identifier. This is later on used to identify the experiment run’s result data in the database. This parameter is mandatory.seed
(integer): The initial seed of the random number generator. Seeding any random number generator guarentees the reproducibility of the experiment run, since all random number generators are actually pseudo random number generators. I.e., given a known seed, they will always produce the same sequence of numbers.version
(string): The version for which the experiment run document is valid. If the version of palaestrAI and the one given in the experiment run document differ, a warning is emitted.
In addition, the following two top-level entries are also required:
schedule
: The experiment run schedule. Every experiment run defines at least one phase within its schedule. The schedule defines what combination of environments and agents are used, and at which point a simulation phase ends.run_config
: Defines the run’s termination condition.
Schedules#
They key concept of an experiment run is its schedule. The schedule defines one or more phases. Each phase consists of
one (or more) environments
at least one agent
a termination condition.
palaestrAI starts an experiment run’s execution with the first phase. It initialzes environment(s) and agent(s), then executes this portion of the run. When the termination condition holds, the phase ends and palaestrAI turns to the next phase. If no other phase exists, the experiment run execution ends.
Each phase config consists of the following entries:
environments
: The list of environment defintions (see the section on environments below).agents
: The list of agents participating in this particular phase (see section on agents below).simulation
: Chooses the simuation controller for a phase. The simulation controller defines in which order agents interact with their environment. Currently, the most commonly used simulation controller is theVanillaSimulationController
: Here, all agents receive their sensor inputs at once; their actuators’ setpoints are also applied to the environment at once. Then, the new state of the environment is computed.phase_config
:mode
: One oftraining
ortest
; defines how the agents act in the environment (training vs. policy exploitation for learning agents).episodes
(integer): How often the execution of the phase is repeated, which is specifically important for learning agentsworkers
(integer): How many parallel instances of environments and agents are spawned (e.g., in order to generate training data in parallel).
This is an example of an experiment run schedule with two phases:
- phase_0: # Name of the current phase. Can be any user-chosen name
environments: # Definition of the environments for this phase
- environment:
name: palaestrai.environment.dummy_environment:DummyEnvironment
uid: myenv
params: {"discrete": true}
agents: # Definiton of agents for this phase
- name: mighty_defender
brain:
name: palaestrai.agent.dummy_brain:DummyBrain
params: { "store_path": "./custom" } # the base store path
muscle:
name: palaestrai.agent.dummy_muscle:DummyMuscle
params: { }
objective:
name: palaestrai.agent.dummy_objective:DummyObjective
params: {"params": 1}
sensors: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
actuators: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
- name: evil_attacker
brain:
name: palaestrai.agent.dummy_brain:DummyBrain
params: { }
muscle:
name: palaestrai.agent.dummy_muscle:DummyMuscle
params: { }
objective:
name: palaestrai.agent.dummy_objective:DummyObjective
params: {"params": 1}
sensors: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
actuators: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
simulation: # Definition of the simulation controller for this phase
name: palaestrai.simulation:VanillaSimController
conditions:
- name: palaestrai.simulation:VanillaSimControllerTerminationCondition
params: {}
phase_config: # Additional config for this phase
mode: train
worker: 1
episodes: 1
- phase_1: # Name of the current phase. Can be any user-chosen name
environments: # Definition of the environments for this phase
- environment:
name: palaestrai.environment.dummy_environment:DummyEnvironment
uid: myenv
params: {"discrete": true}
agents: # Definiton of agents for this phase
- name: mighty_defender
# we load the agent with the same name and the same experiment_id, optional: specify "agent_name" or "experiment_id"
load: {base: "./custom", phase_name: "phase_0"}
brain:
name: palaestrai.agent.dummy_brain:DummyBrain
params: { "store_path": "./custom" }
muscle:
name: palaestrai.agent.dummy_muscle:DummyMuscle
params: { }
objective:
name: palaestrai.agent.dummy_objective:DummyObjective
params: {"params": 1}
sensors: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
actuators: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
- name: evil_attacker
load: {phase_name: "phase_0"}
brain:
name: palaestrai.agent.dummy_brain:DummyBrain
params: { }
muscle:
name: palaestrai.agent.dummy_muscle:DummyMuscle
params: { }
objective:
name: palaestrai.agent.dummy_objective:DummyObjective
params: {"params": 1}
sensors: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
actuators: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
simulation: # Definition of the simulation controller for this phase
name: palaestrai.simulation:VanillaSimController
conditions:
- name: palaestrai.simulation:VanillaSimControllerTerminationCondition
params: {}
phase_config: # Additional config for this phase
mode: train
worker: 1
episodes: 1
The Cascade#
Configuration options in an experiment run file cascade the phases. This means that any definition of environments, agents, or simulation controllers that is given in the first phase is implicitly applied to all following phases, unless overwritten.
Following phases can then redefine select parts or add new ones. Suppose an
environment with the uid
of myenv
is defined in the first phase, then
this environment is present in all following phases. If a particular phase
then also defines an environment with the uid
of myenv
, then this
overwrites all definitions for this particular environment. All phases
after this definition also use the new definition, not that from the first
phase. Any phase could, in addition, define a second environment;
this environment would then be present for that particular phases as well
as all phases that follow.
This way, an experimenter only needs to note the changes between phases.
Defining Entities (Environments, Agents, etc.)#
Each of the loadable entitites (environments, agents, simulation controllers, termination conditions, etc.) follow a simple schema in the experiment run file. Their definiton contains at least two keys:
name
: The fully-qualified name of the loadable class in the formatpackage.package:ClassName
.params
: A dictionary containing any parameters that are passed to the object upon initialization
An example for an environment would look like this:
environment:
name: palaestrai.environment.dummy_environment:DummyEnvironment
params: {"discrete": true}
Environments#
An environment is a world any agent can act in. The definition of an environment contains the following configuration:
environment
:uid
(string): A name that needs to be unique in the scope of the experiment run file.name
(string): Name of the class that contains the actual code of the environmentparameters
(dict): Any parameters specific to the environment
reward
: The definition of aReward
. It is optional and useful when the environment is too complex to emit only one particular reward.name
(string)params
(dict)
state_transformer
: An optionalEnvironmentStateTransformer
used to filter the environment’s respective current state before it is passed on to the database. Can be used to, e.g., calculate derived values or strip superfluous data.name
(string)params
(string)
Agents#
Agents are acting entities within an environment. In palaestrAI, agents consist of a brain and one or more muscles. The brain is a separate entitity that is responsible for the training, while a muscle acts within an environment, i.e., it is responsible only for the inference. Details on the Brain-Muscle-Split concept can be found in the documentation of the Brain and Muscle API.
name
(string): The agent’s name; a human-readable string.brain
: Definition of the agent’s brainname
params
muscle
: The agent’s muscle.name
params
objective
: The reference to the agent’sObjective
, which is a piece of code that calculates an objective value from the environment’s reward.name
params
sensors
(list of strings): Connects sensors offered by an environment to the agent. The list consist of the sensor IDs of each environment, e.g.,[world_1.sensor_1, world_1.sensor_2, world_2.sensor_1]
. Note thatworld_1
in this case is theuid
of the particular environment, i.e., the unique ID string that was given when the environment was introduced to the experiment run.actuators
(list of strings): Connects actuator’s that are applicable in a particular environment to the agent. The syntax is the same as with the sensors.load
: Allows to load a model (policy) from a previous instance. Every agent that participates in an experiment run phase dumps its brain (policy model) when the phase finishes. Models can be loaded from other phases, other experiment runs, and even other agents. If onlyload
is given (as an empty dictionary), it is assumed that the same agent from the same experiment run from the previous phase should be loaded. Other specifications require setting the following keys:agent
(string): Name (UID) of the agent that should be loaded. Corresponds to thename
key in the agent definition.experiment_run
(string): Name (UID) of the experiment run the policy model should be loaded from. This corresponds to the globaluid
parameter of the experiment run.phase
(int): Number of the phase the model was saved in. Phase indices start with 0.
Simulation Controllers#
Simulation controllers define exactly how a phase is executed, e.g., in
which order agents act. Currently, the most commonly used simulation
controller is the VanillaSimulationController
, where all agents
act at once.
name
(string)conditions
(list): The, given two termination conditions, ``t1`
andt2
, the phase terminates if the expressiont1.check_termination() or t2.check_termination()
holds.name
params
Termination Conditions#
A TerminationCondition
is a piece of code that defines when a
phase or a whole experiment run ends. If multiple conditions are given they
are checked in the order in which they are defined.
An Example#
# Very simple dummy experiment run. Does nothing except for exercising
# all relevant components of the software.
# Do not change it! But you can copy it and modify it to your needs,
# of course. :^)
uid: "Yo-ho, a dummy experiment run for me!"# User defined ID
seed: 42 # The random seed, as usual
# Version of palaestrai this run file is compatible with
# Just a check and a log message if versions do not match
version: "3.4"
schedule: # The schedule for this run; it is a list
- phase_0: # Name of the current phase. Can be any user-chosen name
environments: # Definition of the environments for this phase
- environment:
name: palaestrai.environment.dummy_environment:DummyEnvironment
uid: myenv
params: {"discrete": true}
state_transformer:
name: tests.fixtures:CountingWorldStateTransformer
params: {}
agents: # Definiton of agents for this phase
- name: mighty_defender
brain:
name: palaestrai.agent.dummy_brain:DummyBrain
params: { "store_path": "./custom" } # the base store path
muscle:
name: palaestrai.agent.dummy_muscle:DummyMuscle
params: { }
objective:
name: palaestrai.agent.dummy_objective:DummyObjective
params: {"params": 1}
sensors: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
actuators: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
- name: evil_attacker
brain:
name: palaestrai.agent.dummy_brain:DummyBrain
params: { }
muscle:
name: palaestrai.agent.dummy_muscle:DummyMuscle
params: { }
objective:
name: palaestrai.agent.dummy_objective:DummyObjective
params: {"params": 1}
sensors: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
actuators: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
simulation: # Definition of the simulation controller for this phase
name: palaestrai.simulation:VanillaSimulationController
conditions:
- name: palaestrai.simulation:VanillaSimControllerTerminationCondition
params: {}
phase_config: # Additional config for this phase
mode: train
worker: 1
episodes: 1
- phase_1: # Name of the current phase. Can be any user-chosen name
environments: # Definition of the environments for this phase
- environment:
name: palaestrai.environment.dummy_environment:DummyEnvironment
uid: myenv
params: {"discrete": true}
agents: # Definiton of agents for this phase
- name: mighty_defender
# we load the agent with the same name and the same experiment_id, optional: specify "agent_name" or "experiment_id"
load: {base: "./custom", phase_name: "phase_0"}
brain:
name: palaestrai.agent.dummy_brain:DummyBrain
params: { "store_path": "./custom" }
muscle:
name: palaestrai.agent.dummy_muscle:DummyMuscle
params: { }
objective:
name: palaestrai.agent.dummy_objective:DummyObjective
params: {"params": 1}
sensors: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
actuators: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
- name: evil_attacker
load: {phase_name: "phase_0"}
brain:
name: palaestrai.agent.dummy_brain:DummyBrain
params: { }
muscle:
name: palaestrai.agent.dummy_muscle:DummyMuscle
params: { }
objective:
name: palaestrai.agent.dummy_objective:DummyObjective
params: {"params": 1}
sensors: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
actuators: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
simulation: # Definition of the simulation controller for this phase
name: palaestrai.simulation:VanillaSimulationController
conditions:
- name: palaestrai.simulation:VanillaSimControllerTerminationCondition
params: {}
phase_config: # Additional config for this phase
mode: train
worker: 1
episodes: 1
run_config: # Not a runTIME config
condition:
name: palaestrai.experiment:VanillaRunGovernorTerminationCondition
params: {}
Further Reading#
An experiment run defines only a single execution. For a full-featured design of experiments, one would execute several experiment runs with variations of parameters (factors). For this reason, the definition of an experiment in contrast to an experiment run exists. Experiments spawn one or more experiment runs.