*********************** Simulation Flow Control *********************** About ----- palaestrAI allows to define when an agent acts, when environments are updated (stepped), and at which point an episode or phase ends. This is called simulation flow control and achieved through simulation controllers and termination conditions. **Simulation controllers** make the simulation tick; they define which data is passed to which entitity at which point. For example, the *taking turns* simulation controller allows each agent to act in turn, and between agent actions steps all environments. **Termination conditions** decide when an episode or a phase ends. For example, an episode can end when a particular agent is successful enough; or a phase could end when a fixed number of episodes have been exeucted. Simulation Controllers ---------------------- Taking Turns ^^^^^^^^^^^^^ .. autoclass:: palaestrai.simulation.TakingTurnsSimulationController Vanilla (Scatter-Gather) ^^^^^^^^^^^^^^^^^^^^^^^^ .. autoclass:: palaestrai.simulation.VanillaSimulationController Termination Conditions Available -------------------------------- Agent Objective ^^^^^^^^^^^^^^^ .. autoclass:: palaestrai.experiment.AgentObjectiveTerminationCondition Environment Termination Condition ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. autoclass:: palaestrai.experiment.EnvironmentTerminationCondition Maximum Number of Episodes ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. autoclass:: palaestrai.experiment.MaxEpisodesTerminationCondition Default (Vanilla) Phase Termination Condition ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. autoclass:: palaestrai.experiment.VanillaRunGovernorTerminationCondition Multiple Termination Conditions ------------------------------- Multiple ``TerminationCondition``\s can be used by the use of custom classes like the :py:class:`~palaestrai.experiment.VanillaRunGovernorTerminationCondition` but they can also be combined in the experiment file. Conditions on the **episode** level can be used together, i.e., **OR**\ed by adding them in the *conditions* list, e.g.:: definitions: agents: myagent: name: &agent_name My Agent # (Other agent definitions omitted) phase_config: mode: train worker: 2 simulation: taking_turns: name: palaestrai.simulation:Vanilla conditions: - name: palaestrai.experiment:EnvironmentTerminationCondition params: {} - name: palaestrai.experiment:AgentObjectiveTerminationCondition params: *agent_name : brain_avg200: 10.0 run_config: condition: name: palaestrai.experiment:AgentObjectiveTerminationCondition params: *agent_name : phase_avg5: 1.0 This configuration means that **an episode** of one of the two workers of ``My Agent`` ends once the worker have an average objective of at least ``10`` over the last ``200`` steps of the current episode **OR** if the ``Environment`` terminates. Furthermore, independently of the ``TerminationCondition``\s for the episode, **a phase** ends once the average objective value over all steps for each episode over the last ``5`` episodes is greater than ``1.0``\. .. note:: If the max. amount of steps in an episode is less than ``200`` then the average of the objective values for the ``brain_avg200: 10.0`` condition never gets calculated because the objective values of the steps never fill up the window of at least ``200`` required objective values of the last steps. If this is the case then the **episode** level condition for the :py:class:`~palaestrai.experiment.AgentObjectiveTerminationCondition` is basically ineffective but nevertheless required for **phase** level condition.