palaestrai.agent#
Algorithms: Agent, Brain, and Muscle#
Agent#
An agent is a container for Brain
(trainer), Muscle`(s)
(worker), and :class:`Objective
(objective reward defintion of an agent).
- class palaestrai.agent.Agent(uid: str, brain_classname: str, brain: Brain | None, brain_params: Dict[str, Any], muscle_classname: str, muscles: Dict[str, Muscle | None], muscle_params: Dict[str, Any], sensors: List[SensorInformation], actuators: List[ActuatorInformation])[source]#
Bases:
object
Stores information about an agent.
The agent class is used to store information about an agent. It is currently used by the simulation controller to have an internal representation of all agents.
- Parameters:
uid (str) – The user-defined ID (“name”) of an Agent.
brain_classname (str) – Name of the class implementing the
Brain
learner algorithmbrain (
palaestrai.agent.Brain
, optional) – An instance of a palaestrAIBrain
. Dynamically instanciated from the ::Agent.brain_classname.brain_params (dict) – This dictionary contains all parameters needed by the
Brain
.muscle_classname (str) – Name of the class implementing the
Muscle
inference algorithmmuscles (dict of str,
palaestrai.agent.Muscle
) – Internal UIDs to actualMuscle
. Since palaestrAI supports multi-worker setups, inference worker have an internal An instance of a palaestrai muscle. It defines what type of AI is used and is linked to the type of brainmuscle_params (dict of str, any) – Algorithm-specific parameters as they are passed to each
Muscle
instancesensors (list of
SensorInformation
) – The list of sensors the agent is allowed to access.actuators (list of
ActuatorInformation
) – The list of actuators the agent is allowed to access.
agent.State#
Denominates the stages of an agent’s lifecycle.
Brain#
Each agent has a brain, which stores experiences gather by its
Muscle
s (workers) and learns from them.
- class palaestrai.agent.Brain[source]#
Baseclass for all brain implementation
The brain is the central learning instance. It coordinates all muscles (if multiple muscles are available). The brain does all (deep) learning tasks and delivers a model to the muscles.
The brain has one abstract method
thinking()
that has to be implemented.Brain objects store their state and can re-load previous states by using the infrastructure provided by the
BrainDumper
infrastructure. For this, concrete Brain classes need to provide implementations ofload()
andstore()
.- property actuators: List[ActuatorInformation]#
All actuators a Muscle can act with.
- load()[source]#
Loads the current state of the model
This method is called whenever the current state of the brain should be restored. How a particular model is deserialized is up to the concrete implementation. Also, brains may be divided into sub-models (e.g., actor and critic), whose separate storage is relized via tags. Implementing this method allows for a versatile implementation of this.
It is advisable to use the storage facilities of palaestrAI. They are available through the method
_dumpers, tag)()
. This function calls all available dumpers to restore the serialized brain dump (optionally identified via atag
). It returns a BinaryIO object that can then be used in the implementation. The attribute ::~Brain._dumpers is initialized to the list of available dumpers/loaders.
- property memory: Memory#
The Brain’s memory
- property sensors: List[SensorInformation]#
All sensors the Brain (and its Muscles) know about
- setup()[source]#
Brain setup method
This method is called by the
AgentConductor
just before the main loop is intered (run()
). In the base Brain class, it is empty and does nothing. However, any derived class may implement it to do local setup before the main loop is entered.Potential tasks that could be done in this method is to set the size limit of the
Memory
via ::Memory.size_limit, or anything that needs to access the ::Brain.seed, ::Brain.sensors, or ::Brain.actuators, as they’re not yet available in the constructor.This method is guaranteed to be called in the same process space as the main loop method,
Brain.run()
.
- store()[source]#
Stores the current state of the model
This method is called whenever the current state of the brain should be saved. How a particular model is serialized is up to the concrete implementation. Also, brains may be divided into sub-models (e.g., actor and critic), whose separate storage is relized via tags. Implementing this method allows for a versatile implementation of this.
It is advisable to use the storage facilities of palaestrAI. They are available through the method
_dumpers, tag)()
. This function calls all available dumpers to store the serialized brain dump provided in the parameterbinary_io
and optionally attaches atag
to it. The attribute ::~Brain._dumpers is initialized to a list of available dumpers and can be used directly.
- abstract thinking(muscle_id: str, data_from_muscle: Any) Any [source]#
Think about a response using the provided information.
The
thinking()
method is the place for the implementation of the agent’s/brain’s logic. The brain can use the current sensor readings, review the actions of the previous thinking and consider the reward (provided by the objective).Usually, this is the place where machine learning happens, but other solutions are possible as well (like a set of rules or even random based results).
The method receives only the name of the
Muscle
that is sending data, along with whatever data thisMuscle
wants to send to the Brain. As this is completely implementation-specific, this method does not impose any restrictions.Any data that is available to palaestrAI, such as the actual sensor readings, setpoints a Muscle provided, rewards, the objective function’s value (goal/utility function), and whether the simulation is done or not, is available via the Brain’s
Memory
(cf. ::Brain.memory).
Muscle#
Muscles are the worker objects of an agent. They act within an
Environment
, performing policy inference.
- class palaestrai.agent.Muscle(*args, **kwargs)[source]#
An acting entity in an environment.
Each Muscle is an acting entity in an environment: Given a sensor input, it proposes actions. Thus, Muscles implement input-to-action mappings. A muscle does, however, not learn by itself; for that, it needs a
Brain
. Every time a Muscle acts, it sends the following inputs to aBrain
:Sensor inputs it received
actuator set points it provided
reward received from the proposed action.
When implementing an algorithm, you have to derive from the Muscle ABC and provide the following methods:
propose_actions()
, which implements the input-to-action mappingupdate()
, which handles how updates from theBrain
are incorporated into the muscle.
- add_statistics(key: str, value: Any, allow_overwrite=False)[source]#
Statistics dict
Each Muscle can have its own statistic metrics, that are calculated with each step, i.e., after each call of
propose_actions()
. TheBrain
can provide occasionally calculated statistics via an update to the Muscle. The Muscle then can choose to update its statistics for storing.
- property memory: Memory#
Muscle
Memory
.Each Muscle can have its own, personal
Memory
. Internally, the memory stores sensor readings, actuator setpoints provided by the Muscle, as well as rewards from the environment and the result of the Muscle’s (i.e., Agent’s) objective function.- Returns:
The Muscle
Memory
.- Return type:
Memory
- property mode: Mode#
Internal mode of operations
Usually, an agent operates under the assumption of a certain modus operandi. This can be, for example, the distinction between training ( ::Mode.TRAIN) and testing (::Mode.TEST).
- Returns:
The agent’s operations mode
- Return type:
::Mode
- pop_statistics() Dict[str, Any] [source]#
Returning current statistics and resetting it
This method returns the statistics dict and clears it afterwards.
Because the statistics dict should contain metrics that refer to one step, it is stored and cleared after each one.
- Returns:
The dict contains a mapping of metric keys to values. This dynamically allows various implementation-dependent statistics metrics.
- Return type:
Dict
- prepare_model()[source]#
Loading a trained model for testing
This method loads dumped brain states from a given previous phase, or even experiment run. For details, see the documentation on experiment run files (the
load
key).This method is called whenever the current state of a muscle model should be restored. How a particular model is deserialized is up to the concrete implementation. Also, brains may be divided into sub-models (e.g., actor and critic), whose separate storage is realized via tags. Implementing this method allows for a versatile implementation of this.
It is advisable to use the storage facilities of palaestrAI. These are available through ::Muscle.load. The model location has then been pre-set from the experiment run file.
- abstract propose_actions(sensors: List[SensorInformation], actuators_available: List[ActuatorInformation]) Tuple[List[ActuatorInformation], Any] [source]#
Process new sensor information and produce actuator setpoints.
This method provides the essential inference task of the Muscle: It takes current sensor information and is expected to produce a list of actuator setpoints that can be applied in the ::Environment. How the actuator values are produced and how the sensor information are processed is up to the developer.
This is the essential abstract method that needs to be implemented by every Muscle.
Sensor readings and the list of available actuators are valid for the current time. Previous sensor readings, rewards, and objective value can be retrieved from the Muscle’s ::Memory, which is accessible through the ::Muscle.memory property.
- Parameters:
sensors (list of SensorInformation) – List of new SensorInformation for all available sensors
actuators_available (list of ActuatorInformation) – List of all actuators that are currently available to the agent
- Returns:
A Tuple containing: (1) The actual setpoints (an list of ::ActuatorSetpoint objects), for which it is allowed to simply use the objects that are passed as parameters, deep-copying is not necessary; (2) any other data that should be sent to the Muscle’s ::Brain.
- Return type:
tuple of two elements
- reset()[source]#
Called in order to reset the Muscle.
There is a number of occasions in which the Muscle should stay active, but reset. For example, when a new episode of the same experiment run phase is started. Then, the Muscle is allowed (or better, encouraged) to keep its state, but acknowledge that a reset has occured and the Muscle does not expect the seamless continuation of an episode. Implementing this method is optional; if it is not implemented, nothing will happen on reset and the Muscle will also be kept as-is.
- setup()[source]#
Generic setup method, called just before ::~Muscle.run
This method is called just before the main loop in ::~Muscle.run commences. It can be used for any setup tasks. The method is guranteed to be called in the same process as the main loop. Also, the communications link to the brain will already be established. However, there are no information about the environment available yet.
There is no need to load the muscle’s inference model here; refer to ::~Muscle.prepare_model for this.
- property uid#
Unique user-defined ID of this Muscle
This is the name of the agent, i.e., what has been defined by a user in an
ExperimentRun
file.- Returns:
uid – The user-defined name of the Muscle
- Return type:
- update(update: Any)[source]#
Update the Muscle.
This method is called if the brain sends an update. What is to be updated is up to the specific implementation. However, this method should update all necessary components.
There might be implementations of
Brain
and Muscles where updates do not happen. Simple, static bots never learn, and, therefore, do not need a mechanism for updates. Therefore, the default implementation of this method is simply to not do anything.- Parameters:
update (any) – Any data that a
Brain
would send to its Muscles upon an update. Implementation-specific.
AgentConductor#
An agent conductor is the guardian of all agent objects’ lifecycle. Each
agent (one brain, at least one muscle) is governed by an
AgentConductor
. This conductor supervises subprocesses, establishes
communication channels, and performs watchdog duties. The
AgentConductor
is not part of the algorithmic definition of a
learning agent, but exists purely for software engineering reasons.
- class palaestrai.agent.AgentConductor(agent_config: dict, seed: int, uid=None)[source]#
This creates a new agent conductor (AC).
The AC receives an agent config, which contains all information for the brain and the muscle. Additional information, like the current run ID, are part of the AgentSetupRequest.
- Parameters:
- async run()[source]#
Monitors agents and facilitates information interchange
This method is the main loop for the
AgentConductor
. It monitors theBrain
object andMuscle
instances of the agent (i.e., the processes) and transceives/routes messages.
- property uid#
Unique, opaque ID of the agent conductor object
- property worker#
Getter for the
MajorDomoWorker
objectThis method returns (possibly lazily creating) the current
MajorDomoWorker
object. It creates this worker on demand. It is not safe to call this method between forks, as forks copy the context information for the worker which is process-depentent.- Return type:
State, Action, and Rewards#
SensorInformation#
Sensor data an agent receives. In simple cases, a list of
SensorInformation
objects describe the full state of the
environment. More complex, realistic cases include the agent not receiving
the full state, or even a modified state. Each SensorInformation
object describes one reading (data point) of one sensor of an agent.
Stores information about a single sensor.
Once created, a SensorInformation object can be called to retrieve its value, e.g.,:
a = SensorInformation(42, some_space)
a() # => 42
a.value # => 42
- param value:
The value of the sensor’s last reading. The type of this value is described by
space
- type value:
int float as described in space
- param space:
An instance of a palaestrai space object defining the type of the value
- type space:
palaestrai.types.Space
- param uid:
A unique identifier for this sensor. The agents use the ID only for assignment of previous value_ids with the same ID. The ID is not analyzed to gain domain knowledge (e.g., if the sensor is called “Powergrid.Bus1”, the agent will not use the ID to identify this sensor as part of a Bus in a powergrid.)
- type uid:
str or int, optional
- param value_ids:
has multiple value_ids, the value_ids can be used to identify the value_ids, e.g., the ids can be the names of the value_ids. This should be used if value is a list or a numpy array.
- type value_ids:
list of str or int or None (default: None) if the sensor
- param sensor_value:
Deprecated in favor of value
- type sensor_value:
int float as described in space
- param observation_space:
Deprecated in favor of space
- type observation_space:
palaestrai.types.Space
- param sensor_id:
Deprecated in favor of uid
- type sensor_id:
str or int, optional
ActuatorInforation#
Stores a set point for one actuator of an agent.
- class palaestrai.agent.ActuatorInformation(value: int | float | ndarray | None = None, space: Space | None = None, uid: str | None = None, value_ids=None, setpoint=None, action_space: Space | None = None, actuator_id=None)[source]#
Stores information about a single actuator.
The actuator information class is used to transfer actuator information. It can be called to set a new value (value):
a = Actuator(some_space) a(42) # a.value is now 42
- Parameters:
value (any, optional) – The set value for this actuator. The type is defined by the
space
. Can be skipped and set afterwards.space (
palaestrai.types.Space
) – An instance of a palaestrai space that defines the type of thevalue
.uid (int or str, optional) – A unique identifier for this actuator. The agents use this ID only to assign the value_ids to the correct actuator. The ID is not analyzed to gain domain knowledge.
value_ids (list of str or int or None (default: None) if the actuator) – has multiple value_ids, the value_ids can be used to identify the value_ids, e.g., the ids can be the names of the value_ids. This should be used if value is a list or a numpy array.
setpoint (int or str, optional) – Deprecated in favor of value
action_space (
palaestrai.types.Space
) – Deprecated in favor of spaceactuator_id (int or str, optional) – Deprecated in favor of uid
RewardInformation#
Environments issue rewards: A reward describes the current performance of an environment with regards to its current state.
- class palaestrai.agent.RewardInformation(value: int | float | np.ndarray | None = None, space: palaestrai.types.Space | None = None, uid: str | None = None, reward_value: int | float | np.ndarray | None = None, observation_space: palaestrai.types.Space | None = None, reward_id: str | None = None)[source]#
Bases:
object
Stores information about a single reward.
Once created, a RewardInformation object can be called to retrieve its value, e.g.,:
a = RewardInformation(42, some_space) a() # => 42 a.reward_value # => 42
- Parameters:
value (Any) – The value of the reward’s last reading. The type of this value is described by
space
space (palaestrai.types.Space) – An instance of a palaestrai space object defining the type of the value
uid (Optional[str]) – A unique identifier for this reward. The agents use the ID only for assignment of previous values with the same ID. The ID is important, if multiple rewards are available and/or the reward is a delayed reward.
reward_value (Any) – Deprecated in favor of value
observation_space (palaestrai.types.Space) – Deprecated in favor of space
reward_id (Optional[str]) – Deprecated in favor of uid
- property observation_space#
- property reward_id#
- property reward_value#
- property space#
- property uid#
- property value#
Objective#
Describes the agents success at reaching its internal objective. The
Objective
object encapsules a function that rates the agent’s
current performance, given state data, actions, and rewards.
- class palaestrai.agent.Objective(params: dict)[source]#
Bases:
ABC
The base class for all objectives.
An objective defines the goal of an agent and changing the objective can, e.g., transform an attacker agent to a defender agent.
The objective can, e.g., a wrapper for the reward of the environment and, in the easiest case, the sign of the reward is flipped (or not) to define attacker or defender. However, the objective can as well use a complete different formula.
- abstract internal_reward(memory: Memory, **kwargs) ndarray | float | None [source]#
Calculate the reward of this objective
- Parameters:
memory (Memory) – The
Memory
that can be accessed to calculate the objective.Memory.tail()
is most often used to get the n latest sensor readings, setpoints, or rewards.- Returns:
objective – The agent’s calculated objective value, i.e., the result of the agent’s utility or goal function. This is based on any information that is stored in the agent’s : class:Memory. It is either a numpy Array, a float, or an empty numpy array or
None
. In the latter case (empty array or None), no objective is stored and all other information from the current action of the agent is discarded.- Return type:
np.ndarray or float, Optional
Example (Dummy) Implementations#
DummyBrain#
- class palaestrai.agent.DummyBrain[source]#
Bases:
Brain
- load()[source]#
Loads the current state of the model
This method is called whenever the current state of the brain should be restored. How a particular model is deserialized is up to the concrete implementation. Also, brains may be divided into sub-models (e.g., actor and critic), whose separate storage is relized via tags. Implementing this method allows for a versatile implementation of this.
It is advisable to use the storage facilities of palaestrAI. They are available through the method
_dumpers, tag)()
. This function calls all available dumpers to restore the serialized brain dump (optionally identified via atag
). It returns a BinaryIO object that can then be used in the implementation. The attribute ::~Brain._dumpers is initialized to the list of available dumpers/loaders.
- store()[source]#
Stores the current state of the model
This method is called whenever the current state of the brain should be saved. How a particular model is serialized is up to the concrete implementation. Also, brains may be divided into sub-models (e.g., actor and critic), whose separate storage is relized via tags. Implementing this method allows for a versatile implementation of this.
It is advisable to use the storage facilities of palaestrAI. They are available through the method
_dumpers, tag)()
. This function calls all available dumpers to store the serialized brain dump provided in the parameterbinary_io
and optionally attaches atag
to it. The attribute ::~Brain._dumpers is initialized to a list of available dumpers and can be used directly.
- thinking(muscle_id, data_from_muscle)[source]#
Think about a response using the provided information.
The
thinking()
method is the place for the implementation of the agent’s/brain’s logic. The brain can use the current sensor readings, review the actions of the previous thinking and consider the reward (provided by the objective).Usually, this is the place where machine learning happens, but other solutions are possible as well (like a set of rules or even random based results).
The method receives only the name of the
Muscle
that is sending data, along with whatever data thisMuscle
wants to send to the Brain. As this is completely implementation-specific, this method does not impose any restrictions.Any data that is available to palaestrAI, such as the actual sensor readings, setpoints a Muscle provided, rewards, the objective function’s value (goal/utility function), and whether the simulation is done or not, is available via the Brain’s
Memory
(cf. ::Brain.memory).
DummyMuscle#
- class palaestrai.agent.DummyMuscle(count_upwards: bool = False)[source]#
Bases:
Muscle
Implements the simples possible Muscle.
This Muscle implementation simply samples the action spaces of all actuators connected to it. If the additional mode
count_upwards
is set, then allDiscrete
action spaces receive upwards counting values (modulo the space dimension). The latter mode exists as convience for testing purposes.- Parameters:
count_upwards (bool, default: False) – Enables upward counting modulo action space for
Discrete
actuators.
- propose_actions(sensors, actuators_available)[source]#
Process new sensor information and produce actuator setpoints.
This method provides the essential inference task of the Muscle: It takes current sensor information and is expected to produce a list of actuator setpoints that can be applied in the ::Environment. How the actuator values are produced and how the sensor information are processed is up to the developer.
This is the essential abstract method that needs to be implemented by every Muscle.
Sensor readings and the list of available actuators are valid for the current time. Previous sensor readings, rewards, and objective value can be retrieved from the Muscle’s ::Memory, which is accessible through the ::Muscle.memory property.
- Parameters:
sensors (list of SensorInformation) – List of new SensorInformation for all available sensors
actuators_available (list of ActuatorInformation) – List of all actuators that are currently available to the agent
- Returns:
A Tuple containing: (1) The actual setpoints (an list of ::ActuatorSetpoint objects), for which it is allowed to simply use the objects that are passed as parameters, deep-copying is not necessary; (2) any other data that should be sent to the Muscle’s ::Brain.
- Return type:
tuple of two elements
- update(data)[source]#
Update the Muscle.
This method is called if the brain sends an update. What is to be updated is up to the specific implementation. However, this method should update all necessary components.
There might be implementations of
Brain
and Muscles where updates do not happen. Simple, static bots never learn, and, therefore, do not need a mechanism for updates. Therefore, the default implementation of this method is simply to not do anything.- Parameters:
update (any) – Any data that a
Brain
would send to its Muscles upon an update. Implementation-specific.
DummyObjective#
- class palaestrai.agent.DummyObjective(params=None)[source]#
Bases:
Objective
A simple objective that sums all environment rewards
This objective is a simple pass-through objective that sums up the
RewardInformation
from the latest addition to an agent’sMemory
. I.e., it usesMemory.tail()
to get the latest row, and then sums allRewardInformation
objects that were stored here. I.e.,sum(memory.tail(1).rewards)
.This can be used as a pass-through for the simpler reinforcement learning cases, in which the environment’s reward is also the agent’s objective function. It can also act as a simple dummy (i.e., placeholder) for a more meaningful objective function.
- internal_reward(memory: Memory, **kwargs) float [source]#
Calculate the reward of this objective
- Parameters:
memory (Memory) – The
Memory
that can be accessed to calculate the objective.Memory.tail()
is most often used to get the n latest sensor readings, setpoints, or rewards.- Returns:
objective – The agent’s calculated objective value, i.e., the result of the agent’s utility or goal function. This is based on any information that is stored in the agent’s : class:Memory. It is either a numpy Array, a float, or an empty numpy array or
None
. In the latter case (empty array or None), no objective is stored and all other information from the current action of the agent is discarded.- Return type:
np.ndarray or float, Optional