Experiment Run Definition Documents#

Introduction#

Training and test of agents on one or more environments, i.e., the execution of palaestrAI, is controlled from a single file: the Experiment Run document. This document defines the following major components of a run:

  1. Run metadata (name, seed, etc.)

  2. Environments and rewards

  3. Agents and their objectives

  4. Termination conditions

An experiment run document defines a single, reproducible execution of palaestrAI. In order to train agents and test them, this is the only document an experimenter will need. Writing actual code is only necessary if new environments or agents are to be used.

The experiment run document is a YAML file that contains global options as well as the definition of a schedule of execution.

Global Options#

A small number of mandatory entries are specified on global level. They are:

  • uid (string): A unique, user-specified identifier. This is later on used to identify the experiment run’s result data in the database. This parameter is mandatory.

  • seed (integer): The initial seed of the random number generator. Seeding any random number generator guarentees the reproducibility of the experiment run, since all random number generators are actually pseudo random number generators. I.e., given a known seed, they will always produce the same sequence of numbers.

  • version (string): The version for which the experiment run document is valid. If the version of palaestrAI and the one given in the experiment run document differ, a warning is emitted.

In addition, the following two top-level entries are also required:

  • schedule: The experiment run schedule. Every experiment run defines at least one phase within its schedule. The schedule defines what combination of environments and agents are used, and at which point a simulation phase ends.

  • run_config: Defines the run’s termination condition.

Schedules#

They key concept of an experiment run is its schedule. The schedule defines one or more phases. Each phase consists of

  • one (or more) environments

  • at least one agent

  • a termination condition.

palaestrAI starts an experiment run’s execution with the first phase. It initialzes environment(s) and agent(s), then executes this portion of the run. When the termination condition holds, the phase ends and palaestrAI turns to the next phase. If no other phase exists, the experiment run execution ends.

Each phase config consists of the following entries:

  • environments: The list of environment defintions (see the section on environments below).

  • agents: The list of agents participating in this particular phase (see section on agents below).

  • simulation: Chooses the simuation controller for a phase. The simulation controller defines in which order agents interact with their environment. Currently, the most commonly used simulation controller is the VanillaSimulationController: Here, all agents receive their sensor inputs at once; their actuators’ setpoints are also applied to the environment at once. Then, the new state of the environment is computed.

  • phase_config:

    • mode: One of training or test; defines how the agents act in the environment (training vs. policy exploitation for learning agents).

    • episodes (integer): How often the execution of the phase is repeated, which is specifically important for learning agents

    • workers (integer): How many parallel instances of environments and agents are spawned (e.g., in order to generate training data in parallel).

This is an example of an experiment run schedule with two phases:

- phase_0:  # Name of the current phase. Can be any user-chosen name
  environments:  # Definition of the environments for this phase
    - environment:
        name: palaestrai.environment.dummy_environment:DummyEnvironment
        uid: myenv
        params: {"discrete": true}
  agents:  # Definiton of agents for this phase
    - name: mighty_defender
      brain:
        name: palaestrai.agent.dummy_brain:DummyBrain
        params: { "store_path": "./custom" } # the base store path
      muscle:
        name: palaestrai.agent.dummy_muscle:DummyMuscle
        params: { }
      objective:
        name: palaestrai.agent.dummy_objective:DummyObjective
        params: {"params": 1}
      sensors: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
      actuators: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
    - name: evil_attacker
      brain:
        name: palaestrai.agent.dummy_brain:DummyBrain
        params: { }
      muscle:
        name: palaestrai.agent.dummy_muscle:DummyMuscle
        params: { }
      objective:
        name: palaestrai.agent.dummy_objective:DummyObjective
        params: {"params": 1}
      sensors: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
      actuators: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
  simulation:  # Definition of the simulation controller for this phase
    name: palaestrai.simulation:VanillaSimController
    conditions:
      - name: palaestrai.simulation:VanillaSimControllerTerminationCondition
        params: {}
  phase_config:  # Additional config for this phase
    mode: train
    worker: 1
      episodes: 1
- phase_1:  # Name of the current phase. Can be any user-chosen name
    environments:  # Definition of the environments for this phase
      - environment:
          name: palaestrai.environment.dummy_environment:DummyEnvironment
          uid: myenv
          params: {"discrete": true}
    agents:  # Definiton of agents for this phase
      - name: mighty_defender
        # we load the agent with the same name and the same experiment_id, optional: specify "agent_name" or "experiment_id"
        load: {base: "./custom", phase_name: "phase_0"}
        brain:
          name: palaestrai.agent.dummy_brain:DummyBrain
          params: { "store_path": "./custom" }
        muscle:
          name: palaestrai.agent.dummy_muscle:DummyMuscle
          params: { }
        objective:
          name: palaestrai.agent.dummy_objective:DummyObjective
          params: {"params": 1}
        sensors: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
        actuators: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
      - name: evil_attacker
        load: {phase_name: "phase_0"}
        brain:
          name: palaestrai.agent.dummy_brain:DummyBrain
          params: { }
        muscle:
          name: palaestrai.agent.dummy_muscle:DummyMuscle
          params: { }
        objective:
          name: palaestrai.agent.dummy_objective:DummyObjective
          params: {"params": 1}
        sensors: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
        actuators: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
    simulation:  # Definition of the simulation controller for this phase
      name: palaestrai.simulation:VanillaSimController
      conditions:
        - name: palaestrai.simulation:VanillaSimControllerTerminationCondition
          params: {}
    phase_config:  # Additional config for this phase
      mode: train
      worker: 1
      episodes: 1

The Cascade#

Configuration options in an experiment run file cascade the phases. This means that any definition of environments, agents, or simulation controllers that is given in the first phase is implicitly applied to all following phases, unless overwritten.

Following phases can then redefine select parts or add new ones. Suppose an environment with the uid of myenv is defined in the first phase, then this environment is present in all following phases. If a particular phase then also defines an environment with the uid of myenv, then this overwrites all definitions for this particular environment. All phases after this definition also use the new definition, not that from the first phase. Any phase could, in addition, define a second environment; this environment would then be present for that particular phases as well as all phases that follow.

This way, an experimenter only needs to note the changes between phases.

Defining Entities (Environments, Agents, etc.)#

Each of the loadable entitites (environments, agents, simulation controllers, termination conditions, etc.) follow a simple schema in the experiment run file. Their definiton contains at least two keys:

  • name: The fully-qualified name of the loadable class in the format package.package:ClassName.

  • params: A dictionary containing any parameters that are passed to the object upon initialization

An example for an environment would look like this:

environment:
     name: palaestrai.environment.dummy_environment:DummyEnvironment
     params: {"discrete": true}

Environments#

An environment is a world any agent can act in. The definition of an environment contains the following configuration:

  • environment:

    • uid (string): A name that needs to be unique in the scope of the experiment run file.

    • name (string): Name of the class that contains the actual code of the environment

    • parameters (dict): Any parameters specific to the environment

  • reward: The definition of a Reward. It is optional and useful when the environment is too complex to emit only one particular reward.

    • name (string)

    • params (dict)

  • state_transformer: An optional EnvironmentStateTransformer used to filter the environment’s respective current state before it is passed on to the database. Can be used to, e.g., calculate derived values or strip superfluous data.

    • name (string)

    • params (string)

Agents#

Agents are acting entities within an environment. In palaestrAI, agents consist of a brain and one or more muscles. The brain is a separate entitity that is responsible for the training, while a muscle acts within an environment, i.e., it is responsible only for the inference. Details on the Brain-Muscle-Split concept can be found in the documentation of the Brain and Muscle API.

  • name (string): The agent’s name; a human-readable string.

  • brain: Definition of the agent’s brain

    • name

    • params

  • muscle: The agent’s muscle.

    • name

    • params

  • objective: The reference to the agent’s Objective, which is a piece of code that calculates an objective value from the environment’s reward.

    • name

    • params

  • sensors (list of strings): Connects sensors offered by an environment to the agent. The list consist of the sensor IDs of each environment, e.g., [world_1.sensor_1, world_1.sensor_2, world_2.sensor_1]. Note that world_1 in this case is the uid of the particular environment, i.e., the unique ID string that was given when the environment was introduced to the experiment run.

  • actuators (list of strings): Connects actuator’s that are applicable in a particular environment to the agent. The syntax is the same as with the sensors.

  • load: Allows to load a model (policy) from a previous instance. Every agent that participates in an experiment run phase dumps its brain (policy model) when the phase finishes. Models can be loaded from other phases, other experiment runs, and even other agents. If only load is given (as an empty dictionary), it is assumed that the same agent from the same experiment run from the previous phase should be loaded. Other specifications require setting the following keys:

    • agent (string): Name (UID) of the agent that should be loaded. Corresponds to the name key in the agent definition.

    • experiment_run (string): Name (UID) of the experiment run the policy model should be loaded from. This corresponds to the global uid parameter of the experiment run.

    • phase (int): Number of the phase the model was saved in. Phase indices start with 0.

Simulation Controllers#

Simulation controllers define exactly how a phase is executed, e.g., in which order agents act. Currently, the most commonly used simulation controller is the VanillaSimulationController, where all agents act at once.

  • name (string)

  • conditions (list): The , given two termination conditions, ``t1` and t2, the phase terminates if the expression t1.check_termination() or t2.check_termination() holds.

    • name

    • params

Termination Conditions#

A TerminationCondition is a piece of code that defines when a phase or a whole experiment run ends. If multiple conditions are given they are checked in the order in which they are defined.

An Example#

# Very simple dummy experiment run. Does nothing except for exercising
# all relevant components of the software. 
# Do not change it! But you can copy it and modify it to your needs,
# of course. :^)

uid: "Yo-ho, a dummy experiment run for me!"# User defined ID
seed: 42  # The random seed, as usual
# Version of palaestrai this run file is compatible with
# Just a check and a log message if versions do not match
version: "3.4"
schedule:  # The schedule for this run; it is a list
  - phase_0:  # Name of the current phase. Can be any user-chosen name
      environments:  # Definition of the environments for this phase
        - environment:
            name: palaestrai.environment.dummy_environment:DummyEnvironment
            uid: myenv
            params: {"discrete": true}
          state_transformer:
            name: tests.fixtures:CountingWorldStateTransformer
            params: {}
      agents:  # Definiton of agents for this phase
        - name: mighty_defender
          brain:
            name: palaestrai.agent.dummy_brain:DummyBrain
            params: { "store_path": "./custom" } # the base store path
          muscle:
            name: palaestrai.agent.dummy_muscle:DummyMuscle
            params: { }
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {"params": 1}
          sensors: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
          actuators: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
        - name: evil_attacker
          brain:
            name: palaestrai.agent.dummy_brain:DummyBrain
            params: { }
          muscle:
            name: palaestrai.agent.dummy_muscle:DummyMuscle
            params: { }
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {"params": 1}
          sensors: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
          actuators: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
      simulation:  # Definition of the simulation controller for this phase
        name: palaestrai.simulation:VanillaSimulationController
        conditions:
          - name: palaestrai.simulation:VanillaSimControllerTerminationCondition
            params: {}
      phase_config:  # Additional config for this phase
        mode: train
        worker: 1
        episodes: 1
  - phase_1:  # Name of the current phase. Can be any user-chosen name
      environments:  # Definition of the environments for this phase
        - environment:
            name: palaestrai.environment.dummy_environment:DummyEnvironment
            uid: myenv
            params: {"discrete": true}
      agents:  # Definiton of agents for this phase
        - name: mighty_defender
          # we load the agent with the same name and the same experiment_id, optional: specify "agent_name" or "experiment_id"
          load: {base: "./custom", phase_name: "phase_0"}
          brain:
            name: palaestrai.agent.dummy_brain:DummyBrain
            params: { "store_path": "./custom" }
          muscle:
            name: palaestrai.agent.dummy_muscle:DummyMuscle
            params: { }
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {"params": 1}
          sensors: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
          actuators: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
        - name: evil_attacker
          load: {phase_name: "phase_0"}
          brain:
            name: palaestrai.agent.dummy_brain:DummyBrain
            params: { }
          muscle:
            name: palaestrai.agent.dummy_muscle:DummyMuscle
            params: { }
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {"params": 1}
          sensors: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
          actuators: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
      simulation:  # Definition of the simulation controller for this phase
        name: palaestrai.simulation:VanillaSimulationController
        conditions:
          - name: palaestrai.simulation:VanillaSimControllerTerminationCondition
            params: {}
      phase_config:  # Additional config for this phase
        mode: train
        worker: 1
        episodes: 1
run_config:  # Not a runTIME config
  condition:
    name: palaestrai.experiment:VanillaRunGovernorTerminationCondition
    params: {}

Further Reading#

An experiment run defines only a single execution. For a full-featured design of experiments, one would execute several experiment runs with variations of parameters (factors). For this reason, the definition of an experiment in contrast to an experiment run exists. Experiments spawn one or more experiment runs.