palaestrAI Quick Start Guide#

palaestrAI is a feature rich framework for learning agents, especially including Deep-Reinforcement-based agents. The focus is to enable training of multiple agent cat the same time, inspired by the ARL principle.

Bird’s-Eye View#

palaestrAI is an ecosystem consisting of a number of packages for different purposes.

A central idea to the whole framework is a proper experimentation process. That means a design of experiments, in which you create an experiment document and apply sampling strategies to the factors designed therein. aresenAI implements this and creates experiment run definitions.

If design of experiments isn’t your forté, don’t worry: You can simply create an experiment run file and execute it. You will achieve a fully reproducible simulation, with all results data safely stored away. For this whole execution & data storage process, palaestrAI proper (the core framework) is responsible.

A experiment run execution entails creating a simulation controller, initializing environments, and also initializing agents that act in this environment. The experiment run file defines all these, i.e., which environment to load, what agents participate in a run, and of course, their parameters. If you want to conduct experiments with already implemented agents and environments, creating this file is where you should begin.

In palaestrAI, agents consists of a brain and muscles. This split between learning & training (brain) and inference and acting (muscle) is intentional, as it allows easily switching training algorithms, multi-worker or async learning, and distributed execution. Agents (i.e., brain-muscle implementations) live in the hARL package. If you are interesting in implementing new algorithms, have a closer look at the Brain and Muscle classes.

Finally, Environment is the base class for all environments. Consider its API documentation if you intend to create your own.

_images/palaestrai-schema.svg

The following sections will guide you through the different parts of setting up and executing an experiment run.

Experiment, Simulation, Phases, Episodes#

palaestrAI is structured in a hierarchical way. On the top is the experiment (file) which defines all components and variables. This file is used to create a design of experiments by using arsenAI. arsenAI creates a number of experiment run files. If you don’t use a design of experiment then you can skip this step and you can create your run file by yourself. One example is shown in this tutorial. Every experiment run is independent from all other experiment runs, they are copies of each other with modified variables.

An experiment run is the closest thing to a classic DRL training you might already know from OpenAIs Gym or comparable frameworks. A experiment run contains one or more phases, normally it contains at least two phases. A training phase and a test phase. Each phase can have one or more episodes, if a episode ends the phase gets restarted, this continuous until the maximum number of episodes is reached. At that point the next phase starts or the run has finished.

Termination#

Both an experiment run and a phase can have a termination condition. Currently, we have implemented the most basic termination conditions. At the experiment run level, the condition is satisfied when all phases are finished. At the phase level, the termination condition is satisfied when all episodes have ended. The end of an episode is currently defined by the environment, which itself has a maximum number of steps and termination conditions. For example, the Midas environment terminates after X simulation steps or if the load flow calculation fails. However, many more conditions are possible, such as termination when a certain reward is reached, time-based termination at each level, and more.

palaestrAI Experiment#

To use palaestrai you need a experiment file. You can either create one experiment file or use arsenai if you want to create a more complex design of experiment. In the end you need at least one experiment YAML file. The YAML file is segmented in a general part, this are settings which are on a experiment level. Each experiment can have multiple phases. A common structure is one training phase followed by one test phase but different setups are possible. Each phase is structured in different components. Each component can be configured.

This is an example for an experiment YAML file. Each file has a user defined id uid which can be used to identify an experiment in the store. You should also set a main seed (seed) which is used to derive individual seeds for e.g. agents or the environment. The seeds are used for random number generators to provide reproducibility. Many deep reinforcement learning algorithms are sensible to the used seed, it is always a good idea to test multiple seeds to ensure a good/bad performance is not the result of a good/bad seed. The version is the targeted palaestrai version, if you have a different version installed as the targeted version in the yaml you will receive a warning.

# Very simple dummy experiment run. Does nothing except for exercising
# all relevant components of the software.
# Do not change it! But you can copy it and modify it to your needs,
# of course. :^)

uid: "Yo-ho, a dummy experiment run for me!"# User defined ID
seed: 42  # The random seed, as usual
# Version of palaestrai this run file is compatible with
# Just a check and a log message if versions do not match
version: 3.4
schedule:  # The schedule for this run; it is a list
  - phase_0:  # Name of the current phase. Can be any user-chosen name
      environments:  # Definition of the environments for this phase
        - environment:
            name: palaestrai.environment.dummy_environment:DummyEnvironment
            uid: myenv
            params: {"discrete": true}
      agents:  # Definition of agents for this phase
        - name: mighty_defender
          brain:
            name: palaestrai.agent.dummy_brain:DummyBrain
            params: { "store_path": "./custom" } # the base store path
          muscle:
            name: palaestrai.agent.dummy_muscle:DummyMuscle
            params: { }
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {"params": 1}
          sensors: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
          actuators: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
        - name: evil_attacker
          brain:
            name: palaestrai.agent.dummy_brain:DummyBrain
            params: { }
          muscle:
            name: palaestrai.agent.dummy_muscle:DummyMuscle
            params: { }
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {"params": 1}
          sensors: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
          actuators: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
      simulation:  # Definition of the simulation controller for this phase
        name: palaestrai.simulation:VanillaSimController
        conditions:
          - name: palaestrai.simulation:VanillaSimControllerTerminationCondition
            params: {}
      phase_config:  # Additional config for this phase
        mode: train
        worker: 1
        episodes: 1
  - phase_1:  # Name of the current phase. Can be any user-chosen name
      environments:  # Definition of the environments for this phase
        - environment:
            name: palaestrai.environment.dummy_environment:DummyEnvironment
            uid: myenv
            params: {"discrete": true}
      agents:  # Definition of agents for this phase
        - name: mighty_defender
          # we load the agent with the same name and the same experiment_id, optional: specify "agent_name" or "experiment_id"
          load: {base: "./custom", phase_name: "phase_0"}
          brain:
            name: palaestrai.agent.dummy_brain:DummyBrain
            params: { "store_path": "./custom" }
          muscle:
            name: palaestrai.agent.dummy_muscle:DummyMuscle
            params: { }
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {"params": 1}
          sensors: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
          actuators: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
        - name: evil_attacker
          load: {phase_name: "phase_0"}
          brain:
            name: palaestrai.agent.dummy_brain:DummyBrain
            params: { }
          muscle:
            name: palaestrai.agent.dummy_muscle:DummyMuscle
            params: { }
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {"params": 1}
          sensors: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
          actuators: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
      simulation:  # Definition of the simulation controller for this phase
        name: palaestrai.simulation:VanillaSimController
        conditions:
          - name: palaestrai.simulation:VanillaSimControllerTerminationCondition
            params: {}
      phase_config:  # Additional config for this phase
        mode: train
        worker: 1
        episodes: 1
run_config:  # Not a runTIME config
  condition:
    name: palaestrai.experiment:VanillaRunGovernorTerminationCondition
    params: {}

There is also a run_config which contains the Termination Condition of the experiment. Currently there is just one termination condition available. If you want to implement one for yourself, you add it here by changing the name to the path of your new condition.

Schedule#

schedule:  # The schedule for this run; it is a list
  - phase_0:  # Name of the current phase. Can be any user-chosen name
      environments:  # Definition of the environments for this phase
        [...]
      agents:  # Definition of agents for this phase
        [...]
      simulation:  # Definition of the simulation controller for this phase
        [...]
      phase_config:  # Additional config for this phase
        mode: train
        worker: 1
        episodes: 1
  - phase_1:  # Name of the current phase. Can be any user-chosen name
      environments:  # Definition of the environments for this phase
        [...]
      agents:  # Definition of agents for this phase
        [...]
      simulation:  # Definition of the simulation controller for this phase
        [...]
      phase_config:  # Additional config for this phase
        mode: train
        worker: 1
        episodes: 1

The schedule defines how the experiment is executed. There is only one schedule, which contains at least one phase. Every phase is defined by an unique name which can be chosen without restrictions. In this example the two phases are named phase_0 and phase_1. Every phase contains at least one environment, at least one agent, exactly one simulation configuration and exactly one phase_config. Phases which build on each other can be overwritten, more on that is in chapter Overwriting Values

Simulation#

The simulation controller controls the experiment and is configured in the simulation block. It contains two components, the simulation controller itself and a termination condition which defines when a simulation run/episode has ended. Currently there is only one of each implemented.

simulation:  # Definition of the simulation controller for this phase
            name: palaestrai.simulation:VanillaSimController
            conditions:
              - name: palaestrai.simulation:VanillaSimControllerTerminationCondition
                params: {}

Agents#

The agents block contains all agents which act at the same time. In the basic ARL scenario there are two agents. The first one is the defender, the second one is the attacker.

agents:  # Definition of agents for this phase
        - name: mighty_defender
          # we load the agent with the same name and the same experiment_id, optional: specify "agent_name" or "experiment_id"
          load: {base: "./custom", phase_name: "phase_0"}
          brain:
            name: palaestrai.agent.dummy_brain:DummyBrain
            params: { "store_path": "./custom" }
          muscle:
            name: palaestrai.agent.dummy_muscle:DummyMuscle
            params: { }
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {"params": 1}
          sensors: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
          actuators: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4]
        - name: evil_attacker
          load: {phase_name: "phase_0"}
          brain:
            name: palaestrai.agent.dummy_brain:DummyBrain
            params: { }
          muscle:
            name: palaestrai.agent.dummy_muscle:DummyMuscle
            params: { }
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {"params": 1}
          sensors: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]
          actuators: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9]

Every agent is defined by a unique name. In this example, the first agent is called mighty_defender. The load parameter defines a local path which is used to load model files. This is needed if you have multiple episodes or at least one test phase.

The parameters brain and muscle are used to define the algorithm of the agent. It is most likely that you use the corresponding brains and muscles (e.g. a TD3 Brain with a TD3 Muscle). But there might be cases in which there are multiple Brains/Muscles of the same algorithm, e.g., a multi-agent Brain, or a Muscle that includes a preprocessing step. In that case you can define it here. For this, you should use the path to the class that should be loaded. In this example, the muscle and brain are part of the palaestrai.agent package, and the dummy brain and muscle should be loaded.

Another common package is the hARL package in which you can find already implemented DRL Algorithms. Each brain/muscle can have parameters; the brain should at least have a store_path parameter. This parameter has to match the load_path.

Note

In the future, the store will be used for loading and storing the brain dumps, but this is currently work in progress. As such, expect this part of the API change in a future version.

Every Agent needs an objective. The objective calculates the internal reward. This internal reward is what is closest to the traditional reward. The purpose of the objective is to evaluate the last action together with the old (and new) state with respect to the agent’s objective. While the environment reward is rating the current state of the reward with no respect to the action of the agents. An objective can have parameters.

At last every Agent needs at least one sensor and one actuator, but is not limited to one. The sensors and actuators are defined as list and identified by their ids. The ID is a combination of the environment uid (which is defined in the environment block) and the name of the sensor. In our case the environment has the uid myenv and the sensor names are numbers from 0 to 9. So the first sensor is myenv.0. The same is applied to the actuators. You have to know both, the env uid and the sensor/actuator names, when you create your environment.

Warning

Make sure, that sensors and actuators don’t change between phases. Currently, no implemented algorithm provides transfer-learning. Also make sure that the order is the same.

You can define as many Agents as you want, currently the actuators are exclusive (defined by the VanillaSimulationController). So if two agents share the same actuator the last received value is used.

Environment#

In the environment block, one or more environments can be defined. Those environments will be executed in parallel, but no data exchange between them is performed by palaestrAI.

environments:  # Definition of the environments for this phase
  - environment:
      name: palaestrai.environment.dummy_environment:DummyEnvironment
      uid: myenv
      params: {"discrete": true}

The key environments holds a list of environments to be defined. Each environment expects to have a single key environment, followed by another dictionary of key-value pairs as value. Each environment is defined by at least a name, a uid and a dictionary params (which might also be empty).

The name is a an import string with a specific syntax. The modules are separated by dots and the class is appended with double colon. The example name would be translated to following python import command:

from palaestrai.environment.dummy_environment import DummyEnvironment

When you want to use a different environment, make sure that your environment can be found in the python path.

The uid is an important parameter, especially when more than one environment is used. You can choose any name here but, for convenience, it should not be too long. The uid is used when assigning sensors and actuators to agents.

Finally, params is a dictionary that may contain any key/value pair required by the environment. In the example, we have only one parameter, which allows to use discrete instead of continuous values in the environment.

Cascading Settings Expansion#

The experiment run files of palaestrAI support a lazy-style definition of the schedule’s phases. This means you can skip definitions already done in a previous phase. Let’s have a look at an example to show how exactly this works.

schedule:  # The schedule for this run; it is a list
  - phase_0:  # Name of the current phase. Can be any user-chosen name
      environments:  # Definition of the environments for this phase
        [...]
      agents:  # Definition of agents for this phase
        [...]
      simulation:  # Definition of the simulation controller for this phase
        [...]
      phase_config:  # Additional config for this phase
        mode: train
        worker: 1
        episodes: 1
  - phase_1:  # Name of the current phase. Can be any user-chosen name
      environments:  # Definition of the environments for this phase
        [...]
      agents:  # Definition of agents for this phase
        [...]
      simulation:  # Definition of the simulation controller for this phase
        [...]
      phase_config:  # Additional config for this phase
        mode: test
        worker: 1
        episodes: 1

This is the config we saw earlier. Without considering the configurations of environments, agents, and simulation, the only thing that differs between the phases is the mode key in the phase_config. The following config is equivalent to the config above:

schedule:  # The schedule for this run; it is a list
  - phase_0:  # Name of the current phase. Can be any user-chosen name
      environments:  # Definition of the environments for this phase
        [...]
      agents:  # Definition of agents for this phase
        [...]
      simulation:  # Definition of the simulation controller for this phase
        [...]
      phase_config:  # Additional config for this phase
        mode: train
        worker: 1
        episodes: 1
  - phase_1:  # Name of the current phase. Can be any user-chosen name
      phase_config:  # Additional config for this phase
        mode: test

The general rule for overwriting is: if the value is not some kind of dictionary, the existing value will be overwritten. If the value was not present before, it will be added. If the value is some kind of dictionary, the overwrite function is called with this value, again. This is done for an arbitrary depth of the initial dictionary.

The process to build the full configuration always looks like:

# Python-like pseudo code

def build_full_config(schedule):

  full_config = list()
  previous_config = dict()

  for current_config in schedule.get_next_phase():

    # Reuse values from the previous config, which is empty in the first
    # iteration
    current_config.copy_and_update_from(previous_config)

    # Add the current-and-updated config to the full config
    full_config.append(current_config)

    # Store a reference to the current config for the next iteration
    previous_config = current_config

  return full_config

This means, if you have more than two phases, the third phase will copy the entries from the second phase, which has copied the entries from the first phase. Or, in other words, unless you change something, the config for the phases are the same.

Store and Database Model#

Connecting to the Database#

The ‘store’ is the module that safes all experiment data: It is palaestrAI’s storage backend. ‘Store’ is a play on words with the ‘general store:’ You can get almost anything (any data) from it!

Connecting to the store is easy: One you’ve set the store_uri in the runtime config, a session object can be retrieved like this:

import palaestrai.store
session = palaestrai.store.Session()

Accessing Data: The Database Model#

palaestrAI’s database model is implemented using SQLAlchemy’s Object-Relational Mapper (ORM). The ORM’s hierarchy mirrors the structure of the palaestrAI ecosystem:

  1. The store contains experiment documents

  2. An experiment document has many associated experiment run documents

  3. Each experiment run has one to many experiment run instances

  4. Experiment run instances contain experiment run phases

  5. In the hierarchy below experiment run phases, the store contains definitions of environments and agents

  6. For each environment that participates in an experiment run phase, world states are stored

  7. Each agent participating in an experiment run phase stores muscle actions and brain states.

Note

Experiment run instances represent executions (and re-executions) of a particular experiment run. Since you can issue palaestrai experiment-start my_experiment.yml several times, or even several users can do so independently, it is important to distinguish executions of an experiment run from the actual definition.

_images/store_er_diagram.png

Once you have a database session object, you can use SQLAlchemy’s ORM query facilities to retrieve values of an experiment. For more information, refer to the extended documentation of the store.