palaestrAI Quick Start Guide ============================ palaestrAI is a feature rich framework for learning agents, especially including Deep-Reinforcement-based agents. The focus is to enable training of multiple agent cat the same time, inspired by the ARL principle. Bird's-Eye View --------------- palaestrAI is an ecosystem consisting of a number of packages for different purposes. A central idea to the whole framework is a proper experimentation process. That means a design of experiments, in which you create an *experiment* document and apply sampling strategies to the factors designed therein. *aresenAI* implements this and creates *experiment run* definitions. If design of experiments isn't your forté, don't worry: You can simply create an *experiment run* file and execute it. You will achieve a fully reproducible simulation, with all results data safely stored away. For this whole execution & data storage process, *palaestrAI* proper (the core framework) is responsible. A experiment run execution entails creating a simulation controller, initializing environments, and also initializing agents that act in this environment. The experiment run file defines all these, i.e., which environment to load, what agents participate in a run, and of course, their parameters. If you want to conduct experiments with already implemented agents and environments, creating this file is where you should begin. In palaestrAI, agents consists of a brain and muscles. This split between learning & training (brain) and inference and acting (muscle) is intentional, as it allows easily switching training algorithms, multi-worker or async learning, and distributed execution. Agents (i.e., brain-muscle implementations) live in the *hARL* package. If you are interesting in implementing new algorithms, have a closer look at the :class:`Brain` and :class:`Muscle` classes. Finally, :class:`Environment` is the base class for all environments. Consider its API documentation if you intend to create your own. .. image:: _static/palaestrai-schema.svg The following sections will guide you through the different parts of setting up and executing an experiment run. Experiment, Simulation, Phases, Episodes ---------------------------------------- palaestrAI is structured in a hierarchical way. On the top is the experiment (file) which defines all components and variables. This file is used to create a design of experiments by using arsenAI. arsenAI creates a number of experiment run files. If you don't use a design of experiment then you can skip this step and you can create your run file by yourself. One example is shown in this tutorial. Every experiment run is independent from all other experiment runs, they are copies of each other with modified variables. An experiment run is the closest thing to a classic DRL training you might already know from OpenAIs Gym or comparable frameworks. A experiment run contains one or more phases, normally it contains at least two phases. A training phase and a test phase. Each phase can have one or more episodes, if a episode ends the phase gets restarted, this continuous until the maximum number of episodes is reached. At that point the next phase starts or the run has finished. Termination ----------- Both an experiment run and a phase can have a termination condition. Currently, we have implemented the most basic termination conditions. At the experiment run level, the condition is satisfied when all phases are finished. At the phase level, the termination condition is satisfied when all episodes have ended. The end of an episode is currently defined by the environment, which itself has a maximum number of steps and termination conditions. For example, the Midas environment terminates after X simulation steps or if the load flow calculation fails. However, many more conditions are possible, such as termination when a certain reward is reached, time-based termination at each level, and more. palaestrAI Experiment --------------------- To use palaestrai you need a experiment file. You can either create one experiment file or use arsenai if you want to create a more complex design of experiment. In the end you need at least one experiment YAML file. The YAML file is segmented in a general part, this are settings which are on a experiment level. Each experiment can have multiple phases. A common structure is one training phase followed by one test phase but different setups are possible. Each phase is structured in different components. Each component can be configured. This is an example for an experiment YAML file. Each file has a user defined id *uid* which can be used to identify an experiment in the store. You should also set a main seed (*seed*) which is used to derive individual seeds for e.g. agents or the environment. The seeds are used for random number generators to provide reproducibility. Many deep reinforcement learning algorithms are sensible to the used seed, it is always a good idea to test multiple seeds to ensure a good/bad performance is not the result of a good/bad seed. The *version* is the targeted palaestrai version, if you have a different version installed as the targeted version in the yaml you will receive a warning. .. code-block:: yaml # Very simple dummy experiment run. Does nothing except for exercising # all relevant components of the software. # Do not change it! But you can copy it and modify it to your needs, # of course. :^) uid: "Yo-ho, a dummy experiment run for me!"# User defined ID seed: 42 # The random seed, as usual # Version of palaestrai this run file is compatible with # Just a check and a log message if versions do not match version: 3.4 schedule: # The schedule for this run; it is a list - phase_0: # Name of the current phase. Can be any user-chosen name environments: # Definition of the environments for this phase - environment: name: palaestrai.environment.dummy_environment:DummyEnvironment uid: myenv params: {"discrete": true} agents: # Definition of agents for this phase - name: mighty_defender brain: name: palaestrai.agent.dummy_brain:DummyBrain params: { "store_path": "./custom" } # the base store path muscle: name: palaestrai.agent.dummy_muscle:DummyMuscle params: { } objective: name: palaestrai.agent.dummy_objective:DummyObjective params: {"params": 1} sensors: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4] actuators: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4] - name: evil_attacker brain: name: palaestrai.agent.dummy_brain:DummyBrain params: { } muscle: name: palaestrai.agent.dummy_muscle:DummyMuscle params: { } objective: name: palaestrai.agent.dummy_objective:DummyObjective params: {"params": 1} sensors: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9] actuators: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9] simulation: # Definition of the simulation controller for this phase name: palaestrai.simulation:VanillaSimController conditions: - name: palaestrai.simulation:VanillaSimControllerTerminationCondition params: {} phase_config: # Additional config for this phase mode: train worker: 1 episodes: 1 - phase_1: # Name of the current phase. Can be any user-chosen name environments: # Definition of the environments for this phase - environment: name: palaestrai.environment.dummy_environment:DummyEnvironment uid: myenv params: {"discrete": true} agents: # Definition of agents for this phase - name: mighty_defender # we load the agent with the same name and the same experiment_id, optional: specify "agent_name" or "experiment_id" load: {base: "./custom", phase_name: "phase_0"} brain: name: palaestrai.agent.dummy_brain:DummyBrain params: { "store_path": "./custom" } muscle: name: palaestrai.agent.dummy_muscle:DummyMuscle params: { } objective: name: palaestrai.agent.dummy_objective:DummyObjective params: {"params": 1} sensors: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4] actuators: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4] - name: evil_attacker load: {phase_name: "phase_0"} brain: name: palaestrai.agent.dummy_brain:DummyBrain params: { } muscle: name: palaestrai.agent.dummy_muscle:DummyMuscle params: { } objective: name: palaestrai.agent.dummy_objective:DummyObjective params: {"params": 1} sensors: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9] actuators: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9] simulation: # Definition of the simulation controller for this phase name: palaestrai.simulation:VanillaSimController conditions: - name: palaestrai.simulation:VanillaSimControllerTerminationCondition params: {} phase_config: # Additional config for this phase mode: train worker: 1 episodes: 1 run_config: # Not a runTIME config condition: name: palaestrai.experiment:VanillaRunGovernorTerminationCondition params: {} There is also a *run_config* which contains the Termination Condition of the experiment. Currently there is just one termination condition available. If you want to implement one for yourself, you add it here by changing the *name* to the path of your new condition. Schedule ~~~~~~~~ .. code-block:: yaml schedule: # The schedule for this run; it is a list - phase_0: # Name of the current phase. Can be any user-chosen name environments: # Definition of the environments for this phase [...] agents: # Definition of agents for this phase [...] simulation: # Definition of the simulation controller for this phase [...] phase_config: # Additional config for this phase mode: train worker: 1 episodes: 1 - phase_1: # Name of the current phase. Can be any user-chosen name environments: # Definition of the environments for this phase [...] agents: # Definition of agents for this phase [...] simulation: # Definition of the simulation controller for this phase [...] phase_config: # Additional config for this phase mode: train worker: 1 episodes: 1 The schedule defines how the experiment is executed. There is only one schedule, which contains at least one phase. Every phase is defined by an unique name which can be chosen without restrictions. In this example the two phases are named *phase_0* and *phase_1*. Every phase contains **at least** one environment, **at least** one agent, **exactly** one simulation configuration and **exactly** one phase_config. Phases which build on each other can be overwritten, more on that is in chapter *Overwriting Values* Simulation ~~~~~~~~~~ The simulation controller controls the experiment and is configured in the simulation block. It contains two components, the simulation controller itself and a termination condition which defines when a simulation run/episode has ended. Currently there is only one of each implemented. .. code-block:: yaml simulation: # Definition of the simulation controller for this phase name: palaestrai.simulation:VanillaSimController conditions: - name: palaestrai.simulation:VanillaSimControllerTerminationCondition params: {} Agents ~~~~~~ The agents block contains all agents which act at the same time. In the basic ARL scenario there are two agents. The first one is the defender, the second one is the attacker. .. code-block:: yaml agents: # Definition of agents for this phase - name: mighty_defender # we load the agent with the same name and the same experiment_id, optional: specify "agent_name" or "experiment_id" load: {base: "./custom", phase_name: "phase_0"} brain: name: palaestrai.agent.dummy_brain:DummyBrain params: { "store_path": "./custom" } muscle: name: palaestrai.agent.dummy_muscle:DummyMuscle params: { } objective: name: palaestrai.agent.dummy_objective:DummyObjective params: {"params": 1} sensors: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4] actuators: [myenv.0, myenv.1, myenv.2, myenv.3, myenv.4] - name: evil_attacker load: {phase_name: "phase_0"} brain: name: palaestrai.agent.dummy_brain:DummyBrain params: { } muscle: name: palaestrai.agent.dummy_muscle:DummyMuscle params: { } objective: name: palaestrai.agent.dummy_objective:DummyObjective params: {"params": 1} sensors: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9] actuators: [myenv.5, myenv.6, myenv.7, myenv.8, myenv.9] Every agent is defined by a unique name. In this example, the first agent is called ``mighty_defender``. The ``load`` parameter defines a local path which is used to load model files. This is needed if you have multiple episodes or at least one test phase. The parameters ``brain`` and ``muscle`` are used to define the algorithm of the agent. It is most likely that you use the corresponding brains and muscles (e.g. a TD3 Brain with a TD3 Muscle). But there might be cases in which there are multiple Brains/Muscles of the same algorithm, e.g., a multi-agent Brain, or a Muscle that includes a preprocessing step. In that case you can define it here. For this, you should use the path to the class that should be loaded. In this example, the muscle and brain are part of the ``palaestrai.agent`` package, and the dummy brain and muscle should be loaded. Another common package is the *hARL* package in which you can find already implemented DRL Algorithms. Each brain/muscle can have parameters; the brain should at least have a ``store_path`` parameter. This parameter has to match the ``load_path``. .. note:: In the future, the store will be used for loading and storing the brain dumps, but this is currently work in progress. As such, expect this part of the API change in a future version. Every Agent needs an *objective*. The objective calculates the internal reward. This internal reward is what is closest to the traditional reward. The purpose of the objective is to evaluate the last action together with the old (and new) state with respect to the agent's objective. While the environment reward is rating the current state of the reward with no respect to the action of the agents. An objective can have parameters. At last every Agent needs at least one *sensor* and one *actuator*, but is not limited to one. The sensors and actuators are defined as list and identified by their ids. The ID is a combination of the environment uid (which is defined in the environment block) and the name of the sensor. In our case the environment has the uid ``myenv`` and the sensor names are numbers from 0 to 9. So the first sensor is ``myenv.0``. The same is applied to the actuators. You have to know both, the env uid and the sensor/actuator names, when you create your environment. .. warning:: Make sure, that sensors and actuators don't change between phases. Currently, no implemented algorithm provides transfer-learning. Also make sure that the order is the same. You can define as many Agents as you want, currently the actuators are exclusive (defined by the VanillaSimulationController). So if two agents share the same actuator the last received value is used. Environment ~~~~~~~~~~~ In the environment block, one or more environments can be defined. Those environments will be executed in parallel, but no data exchange between them is performed by palaestrAI. .. code-block:: yaml environments: # Definition of the environments for this phase - environment: name: palaestrai.environment.dummy_environment:DummyEnvironment uid: myenv params: {"discrete": true} The key ``environments`` holds a list of environments to be defined. Each environment expects to have a single key ``environment``, followed by another dictionary of key-value pairs as value. Each environment is defined by at least a ``name``, a ``uid`` and a dictionary ``params`` (which might also be empty). The ``name`` is a an import string with a specific syntax. The modules are separated by dots and the class is appended with double colon. The example name would be translated to following python import command: .. code-block:: python from palaestrai.environment.dummy_environment import DummyEnvironment When you want to use a different environment, make sure that your environment can be found in the python path. The ``uid`` is an important parameter, especially when more than one environment is used. You can choose any name here but, for convenience, it should not be too long. The ``uid`` is used when assigning sensors and actuators to agents. Finally, ``params`` is a dictionary that may contain any key/value pair required by the environment. In the example, we have only one parameter, which allows to use discrete instead of continuous values in the environment. Cascading Settings Expansion ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The experiment run files of palaestrAI support a lazy-style definition of the schedule's phases. This means you can skip definitions already done in a previous phase. Let's have a look at an example to show how exactly this works. .. code-block:: yaml schedule: # The schedule for this run; it is a list - phase_0: # Name of the current phase. Can be any user-chosen name environments: # Definition of the environments for this phase [...] agents: # Definition of agents for this phase [...] simulation: # Definition of the simulation controller for this phase [...] phase_config: # Additional config for this phase mode: train worker: 1 episodes: 1 - phase_1: # Name of the current phase. Can be any user-chosen name environments: # Definition of the environments for this phase [...] agents: # Definition of agents for this phase [...] simulation: # Definition of the simulation controller for this phase [...] phase_config: # Additional config for this phase mode: test worker: 1 episodes: 1 This is the config we saw earlier. Without considering the configurations of environments, agents, and simulation, the only thing that differs between the phases is the ``mode`` key in the ``phase_config``. The following config is equivalent to the config above: .. code-block:: yaml schedule: # The schedule for this run; it is a list - phase_0: # Name of the current phase. Can be any user-chosen name environments: # Definition of the environments for this phase [...] agents: # Definition of agents for this phase [...] simulation: # Definition of the simulation controller for this phase [...] phase_config: # Additional config for this phase mode: train worker: 1 episodes: 1 - phase_1: # Name of the current phase. Can be any user-chosen name phase_config: # Additional config for this phase mode: test The general rule for overwriting is: if the value **is not** some kind of dictionary, the existing value will be overwritten. If the value was not present before, it will be added. If the value **is** some kind of dictionary, the overwrite function is called with this value, again. This is done for an arbitrary depth of the initial dictionary. The process to build the full configuration always looks like: .. code-block:: python # Python-like pseudo code def build_full_config(schedule): full_config = list() previous_config = dict() for current_config in schedule.get_next_phase(): # Reuse values from the previous config, which is empty in the first # iteration current_config.copy_and_update_from(previous_config) # Add the current-and-updated config to the full config full_config.append(current_config) # Store a reference to the current config for the next iteration previous_config = current_config return full_config This means, if you have more than two phases, the third phase will copy the entries from the second phase, which has copied the entries from the first phase. Or, in other words, unless you change something, the config for the phases are the same. Store and Database Model ------------------------ Connecting to the Database ~~~~~~~~~~~~~~~~~~~~~~~~~~ The ‘store’ is the module that safes all experiment data: It is palaestrAI's storage backend. ‘Store’ is a play on words with the ‘general store:’ You can get almost anything (any data) from it! Connecting to the store is easy: One you've set the ``store_uri`` in the runtime config, a session object can be retrieved like this: .. code-block:: python import palaestrai.store session = palaestrai.store.Session() Accessing Data: The Database Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ palaestrAI's database model is implemented using `SQLAlchemy's Object-Relational Mapper (ORM) `_. The ORM's hierarchy mirrors the structure of the palaestrAI ecosystem: 1. The store contains *experiment* documents 2. An *experiment* document has many associated *experiment run* documents 3. Each *experiment run* has one to many *experiment run instances* 4. *Experiment run instances* contain *experiment run phases* 5. In the hierarchy below *experiment run phases*, the store contains definitions of *environments* and *agents* 6. For each *environment* that participates in an *experiment run phase*, *world states* are stored 7. Each *agent* participating in an *experiment run phase* stores *muscle actions* and *brain states*. .. note:: **Experiment run instances** represent executions (and re-executions) of a particular experiment run. Since you can issue ``palaestrai experiment-start my_experiment.yml`` several times, or even several users can do so independently, it is important to distinguish executions of an experiment run from the actual definition. .. eralchemy:: Once you have a database session object, you can use `SQLAlchemy's ORM query facilities `_ to retrieve values of an experiment. For more information, refer to the extended `documentation of the store `_.