palaestrai.agent¶

Algorithms: Agent, Brain, and Muscle¶

Agent¶

An agent is a container for Brain (trainer), Muscle`(s) (worker), and :class:`Objective (objective reward defintion of an agent).

class palaestrai.agent.Agent(uid: str, brain_classname: str, brain: Brain | None, brain_params: Dict[str, Any], muscle_classname: str, muscles: Dict[str, Muscle | None], muscle_params: Dict[str, Any], sensors: List[SensorInformation], actuators: List[ActuatorInformation])[source]¶

Bases: object

Stores information about an agent.

The agent class is used to store information about an agent. It is currently used by the simulation controller to have an internal representation of all agents.

Parameters:

uid (str) – The user-defined ID (“name”) of an Agent.
brain_classname (str) – Name of the class implementing the Brain learner algorithm
brain (palaestrai.agent.Brain, optional) – An instance of a palaestrAI Brain. Dynamically instanciated from the ::Agent.brain_classname.
brain_params (dict) – This dictionary contains all parameters needed by the Brain.
muscle_classname (str) – Name of the class implementing the Muscle inference algorithm
muscles (dict of str, palaestrai.agent.Muscle) – Internal UIDs to actual Muscle. Since palaestrAI supports multi-worker setups, inference worker have an internal An instance of a palaestrai muscle. It defines what type of AI is used and is linked to the type of brain
muscle_params (dict of str, any) – Algorithm-specific parameters as they are passed to each Muscle instance
sensors (list of SensorInformation) – The list of sensors the agent is allowed to access.
actuators (list of ActuatorInformation) – The list of actuators the agent is allowed to access.

agent.State¶

Denominates the stages of an agent’s lifecycle.

class palaestrai.agent.state.State(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶: States of agent modules

Brain¶

Each agent has a brain, which stores experiences gather by its Muscles (workers) and learns from them.

class palaestrai.agent.Brain[source]¶

Baseclass for all brain implementation

The brain is the central learning instance. It coordinates all muscles (if multiple muscles are available). The brain does all (deep) learning tasks and delivers a model to the muscles.

The brain has one abstract method thinking() that has to be implemented.

Brain objects store their state and can re-load previous states by using the infrastructure provided by the BrainDumper infrastructure. For this, concrete Brain classes need to provide implementations of load() and store().

property actuators: List[ActuatorInformation]¶: All actuators a Muscle can act with.

add_statistics(key: str, value: Any, allow_overwrite=False)[source]¶

Add statistics for later analysis

Each Brain can have its own statistic metrics, which are calculated for a step (i.e., a call to ::~thinking). This can be used to store training error or any other value you’d later want to analyze

load()[source]¶

Loads the current state of the model

This method is called whenever the current state of the brain should be restored. How a particular model is deserialized is up to the concrete implementation. Also, brains may be divided into sub-models (e.g., actor and critic), whose separate storage is relized via tags. Implementing this method allows for a versatile implementation of this.

It is advisable to use the storage facilities of palaestrAI. They are available through the method _dumpers, tag)(). This function calls all available dumpers to restore the serialized brain dump (optionally identified via a tag). It returns a BinaryIO object that can then be used in the implementation. The attribute ::~Brain._dumpers is initialized to the list of available dumpers/loaders.

property memory: Memory¶: The Brain’s memory

property name: str¶: Name of this agent

pop_statistics() → Dict[str, Any][source]¶

Returning current statistics and resetting it

This method returns the statistics dict and clears it afterwards.

Because the statistics dict should contain metrics that refer to one step, it is stored and cleared after each one.

Returns:: The dict contains a mapping of metric keys to values. This dynamically allows various implementation-dependent statistics metrics.
Return type:: Dict

pretrain()[source]¶

Pretrains the Brain before an experiment begins (Offline Learning)

The Learner will have filled the Brain’s Memory with previous trajectories it can access. These trajectories are loaded according to the replay key in the experiment run file. How the pretraining (in DRL parley, known as Offline Learning) is actually conducted is up to the algorithm. I.e., concrete algorithm implementations may overwrite this method if they wish to implement pretraining/offline training.

property seed: int¶: Returns the random seed applicable for this brain instance.

property sensors: List[SensorInformation]¶: All sensors the Brain (and its Muscles) know about

setup()[source]¶

Brain setup method

This method is called by the AgentConductor just before the main loop is intered (run()). In the base Brain class, it is empty and does nothing. However, any derived class may implement it to do local setup before the main loop is entered.

Potential tasks that could be done in this method is to set the size limit of the Memory via ::Memory.size_limit, or anything that needs to access the ::Brain.seed, ::Brain.sensors, or ::Brain.actuators, as they’re not yet available in the constructor.

This method is guaranteed to be called in the same process space as the main loop method, Brain.run().

store()[source]¶

Stores the current state of the model

This method is called whenever the current state of the brain should be saved. How a particular model is serialized is up to the concrete implementation. Also, brains may be divided into sub-models (e.g., actor and critic), whose separate storage is relized via tags. Implementing this method allows for a versatile implementation of this.

It is advisable to use the storage facilities of palaestrAI. They are available through the method _dumpers, tag)(). This function calls all available dumpers to store the serialized brain dump provided in the parameter binary_io and optionally attaches a tag to it. The attribute ::~Brain._dumpers is initialized to a list of available dumpers and can be used directly.

abstract thinking(muscle_id: str, data_from_muscle: Any) → Any[source]¶

Think about a response using the provided information.

The thinking() method is the place for the implementation of the agent’s/brain’s logic. The brain can use the current sensor readings, review the actions of the previous thinking and consider the reward (provided by the objective).

Usually, this is the place where machine learning happens, but other solutions are possible as well (like a set of rules or even random based results).

The method receives only the name of the Muscle that is sending data, along with whatever data this Muscle wants to send to the Brain. As this is completely implementation-specific, this method does not impose any restrictions.

Any data that is available to palaestrAI, such as the actual sensor readings, setpoints a Muscle provided, rewards, the objective function’s value (goal/utility function), and whether the simulation is done or not, is available via the Brain’s Memory (cf. ::Brain.memory).

Parameters:

muscle_id (str) – This is the ID of the muscle which requested the update
data_from_muscle (Any) – Any data the Muscle sends to the Brain

Returns:

Any update that the Muscle. If this value does not evaluate to True (i.e., bool(update) == False), then the Muscle will not be updated.

Return type:

Any

Muscle¶

Muscles are the worker objects of an agent. They act within an Environment, performing policy inference.

class palaestrai.agent.Muscle(*args, **kwargs)[source]¶

An acting entity in an environment.

Each Muscle is an acting entity in an environment: Given a sensor input, it proposes actions. Thus, Muscles implement input-to-action mappings. A muscle does, however, not learn by itself; for that, it needs a Brain. Every time a Muscle acts, it sends the following inputs to a Brain:

Sensor inputs it received
actuator set points it provided
reward received from the proposed action.

When implementing an algorithm, you have to derive from the Muscle ABC and provide the following methods:

propose_actions(), which implements the input-to-action mapping
update(), which handles how updates from the Brain are incorporated into the muscle.

add_statistics(key: str, value: Any, allow_overwrite=False)[source]¶

Statistics dict

Each Muscle can have its own statistic metrics, that are calculated with each step, i.e., after each call of propose_actions(). The Brain can provide occasionally calculated statistics via an update to the Muscle. The Muscle then can choose to update its statistics for storing.

property memory: Memory¶

Muscle Memory.

Each Muscle can have its own, personal Memory. Internally, the memory stores sensor readings, actuator setpoints provided by the Muscle, as well as rewards from the environment and the result of the Muscle’s (i.e., Agent’s) objective function.

Returns:: The Muscle Memory.
Return type:: Memory

property mode: Mode¶

Internal mode of operations

Usually, an agent operates under the assumption of a certain modus operandi. This can be, for example, the distinction between training ( ::Mode.TRAIN) and testing (::Mode.TEST).

Returns:: The agent’s operations mode
Return type:: ::Mode

property name¶: User-defined name of this Muscle, as given in the experiment run

pop_statistics() → Dict[str, Any][source]¶

Returning current statistics and resetting it

This method returns the statistics dict and clears it afterwards.

Because the statistics dict should contain metrics that refer to one step, it is stored and cleared after each one.

Returns:: The dict contains a mapping of metric keys to values. This dynamically allows various implementation-dependent statistics metrics.
Return type:: Dict

prepare_model()[source]¶

Loading a trained model for testing

This method loads dumped brain states from a given previous phase, or even experiment run. For details, see the documentation on experiment run files (the load key).

This method is called whenever the current state of a muscle model should be restored. How a particular model is deserialized is up to the concrete implementation. Also, brains may be divided into sub-models (e.g., actor and critic), whose separate storage is realized via tags. Implementing this method allows for a versatile implementation of this.

It is advisable to use the storage facilities of palaestrAI. These are available through ::Muscle.load. The model location has then been pre-set from the experiment run file.

abstract propose_actions(sensors: List[SensorInformation], actuators_available: List[ActuatorInformation]) → Tuple[List[ActuatorInformation], Any][source]¶

Process new sensor information and produce actuator setpoints.

This method provides the essential inference task of the Muscle: It takes current sensor information and is expected to produce a list of actuator setpoints that can be applied in the ::Environment. How the actuator values are produced and how the sensor information are processed is up to the developer.

This is the essential abstract method that needs to be implemented by every Muscle.

Sensor readings and the list of available actuators are valid for the current time. Previous sensor readings, rewards, and objective value can be retrieved from the Muscle’s ::Memory, which is accessible through the ::Muscle.memory property.

Parameters:

sensors (list of SensorInformation) – List of new SensorInformation for all available sensors
actuators_available (list of ActuatorInformation) – List of all actuators that are currently available to the agent

Returns:

A Tuple containing: (1) The actual setpoints (an list of ::ActuatorSetpoint objects), for which it is allowed to simply use the objects that are passed as parameters, deep-copying is not necessary; (2) any other data that should be sent to the Muscle’s ::Brain.

Return type:

tuple of two elements

reset()[source]¶

Called in order to reset the Muscle.

There is a number of occasions in which the Muscle should stay active, but reset. For example, when a new episode of the same experiment run phase is started. Then, the Muscle is allowed (or better, encouraged) to keep its state, but acknowledge that a reset has occured and the Muscle does not expect the seamless continuation of an episode. Implementing this method is optional; if it is not implemented, nothing will happen on reset and the Muscle will also be kept as-is.

setup()[source]¶

Generic setup method, called just before ::~Muscle.run

This method is called just before the main loop in ::~Muscle.run commences. It can be used for any setup tasks. The method is guranteed to be called in the same process as the main loop. Also, the communications link to the brain will already be established. However, there are no information about the environment available yet.

There is no need to load the muscle’s inference model here; refer to ::~Muscle.prepare_model for this.

teardown()[source]¶

Called just before the Muscle is shut down

Just before the ::~RolloutWorker shuts down, it calls this method on the Muscle. If the method is not implemented, nothing happens; i.e., implementing this method is optional. However, in cases where some last-minute cleanups need to be done, this method is the right place to do it.

property uid¶

Unique user-defined ID of this Muscle

This is the name of the agent, i.e., what has been defined by a user in an ExperimentRun file.

Returns:: uid – The user-defined name of the Muscle
Return type:: str

update(update: Any)[source]¶

Update the Muscle.

This method is called if the brain sends an update. What is to be updated is up to the specific implementation. However, this method should update all necessary components.

There might be implementations of Brain and Muscles where updates do not happen. Simple, static bots never learn, and, therefore, do not need a mechanism for updates. Therefore, the default implementation of this method is simply to not do anything.

Parameters:: update (any) – Any data that a Brain would send to its Muscles upon an update. Implementation-specific.

AgentConductor¶

An agent conductor is the guardian of all agent objects’ lifecycle. Each agent (one brain, at least one muscle) is governed by an AgentConductor. This conductor supervises subprocesses, establishes communication channels, and performs watchdog duties. The AgentConductor is not part of the algorithmic definition of a learning agent, but exists purely for software engineering reasons.

class palaestrai.agent.AgentConductor(agent_config: dict, seed: int, uid=None, name=None)[source]¶

This creates a new agent conductor (AC).

The AC receives an agent config, which contains all information for the brain and the muscle. Additional information, like the current run ID, are part of the AgentSetupRequest.

Parameters:

agent_config (dict) – A dict containing information, how to instantiate brain and muscle.
seed (int) – The random seed for this agent conductor.
uid (str) – The uid (a unique string) for this agent conductor object.
name (str) – User-visible name as chosen by the user in the experiment run file

property name¶: Name as given to the agent by the user in the experiment run file

async run()¶

Main event/state loop of the ESM

This run method is injected into monitored classes if they do not have one already. The structure of run is as follows:

It resets the handlers for SIGCHLD, SIGINT, and SIGTERM to the OS’ default.
It calls monitored.setup(), if it exists.
It creates an ESM instance for the monitored object and adds signal handlers for SIGCHLD, SIGINT, and SIGTERM according to what the monitored class defines (via @ESM.on(signal.SIGINT), etc.)
It transides to the first state, defined by @ESM.enter. It then waits for state changes/events until monitored.stop() is called.
Finally, once the main event/state loop concludes, monitored.teardown() is called (if present).

stop(error=None)¶

Stops the ESM.

Stopping the ESM also means shutting down all running processes and cancelling all outstanding tasks (e.g., request monitors).

Paramters¶

errorException: If given, the ESM will raise this after cleaning up.

property uid¶: Unique, opaque ID of the agent conductor object

State, Action, and Rewards¶

SensorInformation¶

Sensor data an agent receives. In simple cases, a list of SensorInformation objects describe the full state of the environment. More complex, realistic cases include the agent not receiving the full state, or even a modified state. Each SensorInformation object describes one reading (data point) of one sensor of an agent.

Stores information about a single sensor.

Once created, a SensorInformation object can be called to retrieve its value, e.g.,:

a = SensorInformation(42, some_space)
a()  # => 42
a.value  # => 42

param value:: The value of the sensor’s last reading. The type of this value is described by space
type value:: int float as described in space
param space:: An instance of a palaestrai space object defining the type of the value
type space:: palaestrai.types.Space
param uid:: A unique identifier for this sensor. The agents use the ID only for assignment of previous value_ids with the same ID. The ID is not analyzed to gain domain knowledge (e.g., if the sensor is called “Powergrid.Bus1”, the agent will not use the ID to identify this sensor as part of a Bus in a powergrid.)
type uid:: str or int, optional
param value_ids:: has multiple value_ids, the value_ids can be used to identify the value_ids, e.g., the ids can be the names of the value_ids. This should be used if value is a list or a numpy array.
type value_ids:: list of str or int or None (default: None) if the sensor
param sensor_value:: Deprecated in favor of value
type sensor_value:: int float as described in space
param observation_space:: Deprecated in favor of space
type observation_space:: palaestrai.types.Space
param sensor_id:: Deprecated in favor of uid
type sensor_id:: str or int, optional

ActuatorInforation¶

Stores a set point for one actuator of an agent.

Stores information about a single actuator.

The actuator information class is used to transfer actuator information. It can be called to set a new value (value):

a = Actuator(some_space)
a(42)  # a.value is now 42

Parameters:

value (any, optional) – The set value for this actuator. The type is defined by the space. Can be skipped and set afterwards.
space (palaestrai.types.Space) – An instance of a palaestrai space that defines the type of the value.
uid (int or str, optional) – A unique identifier for this actuator. The agents use this ID only to assign the value_ids to the correct actuator. The ID is not analyzed to gain domain knowledge.
value_ids (list of str or int or None (default: None) if the actuator) – has multiple value_ids, the value_ids can be used to identify the value_ids, e.g., the ids can be the names of the value_ids. This should be used if value is a list or a numpy array.
setpoint (int or str, optional) – Deprecated in favor of value
action_space (palaestrai.types.Space) – Deprecated in favor of space
actuator_id (int or str, optional) – Deprecated in favor of uid

RewardInformation¶

Environments issue rewards: A reward describes the current performance of an environment with regards to its current state.

Bases: object

Stores information about a single reward.

Once created, a RewardInformation object can be called to retrieve its value, e.g.,:

a = RewardInformation(42, some_space) a() # => 42 a.reward_value # => 42

Parameters:

value (Any) – The value of the reward’s last reading. The type of this value is described by space
space (palaestrai.types.Space) – An instance of a palaestrai space object defining the type of the value
uid (Optional[str]) – A unique identifier for this reward. The agents use the ID only for assignment of previous values with the same ID. The ID is important, if multiple rewards are available and/or the reward is a delayed reward.
reward_value (Any) – Deprecated in favor of value
observation_space (palaestrai.types.Space) – Deprecated in favor of space
reward_id (Optional[str]) – Deprecated in favor of uid

property observation_space¶

property reward_id¶

property reward_value¶

property space¶

property uid¶

property value¶

Objective¶

Describes the agents success at reaching its internal objective. The Objective object encapsules a function that rates the agent’s current performance, given state data, actions, and rewards.

class palaestrai.agent.Objective(params: dict)[source]¶

Bases: ABC

The base class for all objectives.

An objective defines the goal of an agent and changing the objective can, e.g., transform an attacker agent to a defender agent.

The objective can, e.g., a wrapper for the reward of the environment and, in the easiest case, the sign of the reward is flipped (or not) to define attacker or defender. However, the objective can as well use a complete different formula.

abstract internal_reward(memory: Memory, **kwargs) → ndarray | float | None[source]¶

Calculate the reward of this objective

Parameters:: memory (Memory) – The Memory that can be accessed to calculate the objective. Memory.tail() is most often used to get the n latest sensor readings, setpoints, or rewards.
Returns:: objective – The agent’s calculated objective value, i.e., the result of the agent’s utility or goal function. This is based on any information that is stored in the agent’s : class:Memory. It is either a numpy Array, a float, or an empty numpy array or None. In the latter case (empty array or None), no objective is stored and all other information from the current action of the agent is discarded.
Return type:: np.ndarray or float, Optional

Example (Dummy) Implementations¶

DummyBrain¶

class palaestrai.agent.DummyBrain[source]¶

Bases: Brain

load()[source]¶

Loads the current state of the model

store()[source]¶

Stores the current state of the model

thinking(muscle_id, data_from_muscle)[source]¶

Think about a response using the provided information.

Usually, this is the place where machine learning happens, but other solutions are possible as well (like a set of rules or even random based results).

Parameters:

muscle_id (str) – This is the ID of the muscle which requested the update
data_from_muscle (Any) – Any data the Muscle sends to the Brain

Returns:

Any update that the Muscle. If this value does not evaluate to True (i.e., bool(update) == False), then the Muscle will not be updated.

Return type:

Any

DummyMuscle¶

class palaestrai.agent.DummyMuscle(count_upwards: bool = False)[source]¶

Bases: Muscle

Implements the simples possible Muscle.

This Muscle implementation simply samples the action spaces of all actuators connected to it. If the additional mode count_upwards is set, then all Discrete action spaces receive upwards counting values (modulo the space dimension). The latter mode exists as convience for testing purposes.

Parameters:: count_upwards (bool, default: False) – Enables upward counting modulo action space for Discrete actuators.

propose_actions(sensors, actuators_available)[source]¶

Process new sensor information and produce actuator setpoints.

This is the essential abstract method that needs to be implemented by every Muscle.

Parameters:

sensors (list of SensorInformation) – List of new SensorInformation for all available sensors
actuators_available (list of ActuatorInformation) – List of all actuators that are currently available to the agent

Returns:

Return type:

tuple of two elements

update(data)[source]¶

Update the Muscle.

This method is called if the brain sends an update. What is to be updated is up to the specific implementation. However, this method should update all necessary components.

Parameters:: update (any) – Any data that a Brain would send to its Muscles upon an update. Implementation-specific.

DummyObjective¶

class palaestrai.agent.DummyObjective(params=None)[source]¶

Bases: Objective

A simple objective that sums all environment rewards

This objective is a simple pass-through objective that sums up the RewardInformation from the latest addition to an agent’s Memory. I.e., it uses Memory.tail() to get the latest row, and then sums all RewardInformation objects that were stored here. I.e., sum(memory.tail(1).rewards).

This can be used as a pass-through for the simpler reinforcement learning cases, in which the environment’s reward is also the agent’s objective function. It can also act as a simple dummy (i.e., placeholder) for a more meaningful objective function.

internal_reward(memory: Memory, **kwargs) → float[source]¶

Calculate the reward of this objective

Parameters:: memory (Memory) – The Memory that can be accessed to calculate the objective. Memory.tail() is most often used to get the n latest sensor readings, setpoints, or rewards.
Returns:: objective – The agent’s calculated objective value, i.e., the result of the agent’s utility or goal function. This is based on any information that is stored in the agent’s : class:Memory. It is either a numpy Array, a float, or an empty numpy array or None. In the latter case (empty array or None), no objective is stored and all other information from the current action of the agent is discarded.
Return type:: np.ndarray or float, Optional