***********************
Simulation Flow Control
***********************

About
-----

palaestrAI allows to define when an agent acts, when environments are updated
(stepped), and at which point an episode or phase ends. This is called
simulation flow control and achieved through simulation controllers and
termination conditions.

**Simulation controllers** make the simulation tick; they define which
data is passed to which entitity at which point. For example, the
*taking turns* simulation controller allows each agent to act in turn, and
between agent actions steps all environments.

**Termination conditions** decide when an episode or a phase ends. For
example, an episode can end when a particular agent is successful enough; or
a phase could end when a fixed number of episodes have been exeucted.

Simulation Controllers
----------------------

Taking Turns
^^^^^^^^^^^^^

.. autoclass:: palaestrai.simulation.TakingTurnsSimulationController

Vanilla (Scatter-Gather)
^^^^^^^^^^^^^^^^^^^^^^^^

.. autoclass:: palaestrai.simulation.VanillaSimulationController


Termination Conditions Available
--------------------------------

Agent Objective
^^^^^^^^^^^^^^^

.. autoclass:: palaestrai.experiment.AgentObjectiveTerminationCondition

Environment Termination Condition
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. autoclass:: palaestrai.experiment.EnvironmentTerminationCondition

Maximum Number of Episodes
^^^^^^^^^^^^^^^^^^^^^^^^^^

.. autoclass:: palaestrai.experiment.MaxEpisodesTerminationCondition

Default (Vanilla) Phase Termination Condition
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. autoclass:: palaestrai.experiment.VanillaRunGovernorTerminationCondition


Multiple Termination Conditions
-------------------------------

Multiple ``TerminationCondition``\s can be used by the use of custom classes like
the :py:class:`~palaestrai.experiment.VanillaRunGovernorTerminationCondition` but
they can also be combined in the experiment file. Conditions on the **episode**
level can be used together, i.e., **OR**\ed by adding them in the *conditions*
list, e.g.::

    definitions:
      agents:
        myagent:
          name: &agent_name My Agent
          # (Other agent definitions omitted)
      phase_config:
        mode: train
        worker: 2
      simulation:
        taking_turns:
          name: palaestrai.simulation:Vanilla
          conditions:
          - name: palaestrai.experiment:EnvironmentTerminationCondition
            params: {}
          - name: palaestrai.experiment:AgentObjectiveTerminationCondition
            params:
              *agent_name :
                brain_avg200: 10.0
    run_config:
      condition:
        name: palaestrai.experiment:AgentObjectiveTerminationCondition
        params:
          *agent_name :
            phase_avg5: 1.0

This configuration means that **an episode** of one of the two workers
of ``My Agent`` ends once the worker have an average objective of
at least ``10`` over the last ``200`` steps of the current episode **OR**
if the ``Environment`` terminates.

Furthermore, independently of the ``TerminationCondition``\s for the episode,
**a phase** ends once the average objective value over all steps for each
episode over the last ``5`` episodes is greater than ``1.0``\.

.. note::
    If the max. amount of steps in an episode is less than ``200``
    then the average of the objective values for the
    ``brain_avg200: 10.0`` condition never gets calculated because the
    objective values of the steps never fill up the window of at least
    ``200`` required objective values of the last steps. If this is the
    case then the **episode** level condition for the
    :py:class:`~palaestrai.experiment.AgentObjectiveTerminationCondition`
    is basically ineffective but nevertheless required for
    **phase** level condition.