Brain/Muscle API#
Purpose of Brain and Muscles#
In palaestrAI, agents are logically devided into two parts:
An executing part, mapping sensor inputs to actions
A learning algorithm
Different algorithms have different names for this concept; for example, A3C
calls the muscles workers. The split allows agents to learn from different
environment instances (or configurations) simultaneously, in an asynchronous
fashion, or even to have more sophisticated setups. palaestrAI calls the
learner the brain (palaestrai.agent.Brain
). An agent has exactly
one brain. The executing part is known as a muscle
(palaestrai.agent.Muscle
). An agent can have many, but at least one,
muscle.
Agents are represented in an environment with the muscle(s). A muscle receives sensor readings, upon which it provides actions. From every muscle, the brain receives the muscle’s input, its output (actions), and the reward it received from these actions. The brain can then train on these information provided by its muscle(s). When it decides that training is complete, it updates one (or several, or all) muscles.
How to Add Algorithms to palaestrAI#
In order to add a new algorithm, you can decide to implement the actor (muscle), the learner (brain), or both. The latter one is probably the most common one, as Deep Reinforcement Learning algorithms usually consider both, actor as well as learner.
The new muscle must implement two methods:
The propose_actions()
method serves as an input-to-action mapper. For
example, this is what the policy network in a forward-pass does in a
Deep Reinforcement Learning setting. The second method, update()
, gets
called whenever the brain updates a muscle’s configuration, e.g., by providing
new weights to the policy network. What the update()
method’s parameter
consists of is up to the implementation.
The brain needs only to implement palaestrai.agent.Brain.thinking()
. It
receives every muscle’s sensor inputs, action, and the reward gained from it.
update()
is the main entrypoint for every learner logic.
API Documentation#
Muscle#
- class palaestrai.agent.Muscle(*args, **kwargs)[source]
An acting entity in an environment.
Each Muscle is an acting entity in an environment: Given a sensor input, it proposes actions. Thus, Muscles implement input-to-action mappings. A muscle does, however, not learn by itself; for that, it needs a
Brain
. Every time a Muscle acts, it sends the following inputs to aBrain
:Sensor inputs it received
actuator set points it provided
reward received from the proposed action.
When implementing an algorithm, you have to derive from the Muscle ABC and provide the following methods:
propose_actions()
, which implements the input-to-action mappingupdate()
, which handles how updates from theBrain
are incorporated into the muscle.
- property memory: Memory
Muscle
Memory
.Each Muscle can have its own, personal
Memory
. Internally, the memory stores sensor readings, actuator setpoints provided by the Muscle, as well as rewards from the environment and the result of the Muscle’s (i.e., Agent’s) objective function.- Returns:
The Muscle
Memory
.- Return type:
Memory
- property mode: Mode
Internal mode of operations
Usually, an agent operates under the assumption of a certain modus operandi. This can be, for example, the distinction between training ( ::Mode.TRAIN) and testing (::Mode.TEST).
- Returns:
The agent’s operations mode
- Return type:
::Mode
- prepare_model()[source]
Loading a trained model for testing
This method loads dumped brain states from a given previous phase, or even experiment run. For details, see the documentation on experiment run files (the
load
key).This method is called whenever the current state of a muscle model should be restored. How a particular model is deserialized is up to the concrete implementation. Also, brains may be divided into sub-models (e.g., actor and critic), whose separate storage is realized via tags. Implementing this method allows for a versatile implementation of this.
It is advisable to use the storage facilities of palaestrAI. These are available through ::Muscle.load. The model location has then been pre-set from the experiment run file.
- abstract propose_actions(sensors: List[SensorInformation], actuators_available: List[ActuatorInformation]) Tuple[List[ActuatorInformation], Any] [source]
Process new sensor information and produce actuator setpoints.
This method provides the essential inference task of the Muscle: It takes current sensor information and is expected to produce a list of actuator setpoints that can be applied in the ::Environment. How the actuator values are produced and how the sensor information are processed is up to the developer.
This is the essential abstract method that needs to be implemented by every Muscle.
Sensor readings and the list of available actuators are valid for the current time. Previous sensor readings, rewards, and objective value can be retrieved from the Muscle’s ::Memory, which is accessible through the ::Muscle.memory property.
- Parameters:
sensors (list of SensorInformation) – List of new SensorInformation for all available sensors
actuators_available (list of ActuatorInformation) – List of all actuators that are currently available to the agent
- Returns:
A Tuple containing: (1) The actual setpoints (an list of ::ActuatorSetpoint objects), for which it is allowed to simply use the objects that are passed as parameters, deep-copying is not necessary; (2) any other data that should be sent to the Muscle’s ::Brain.
- Return type:
tuple of two elements
- reset()[source]
Called in order to reset the Muscle.
There is a number of occasions in which the Muscle should stay active, but reset. For example, when a new episode of the same experiment run phase is started. Then, the Muscle is allowed (or better, encouraged) to keep its state, but acknowledge that a reset has occured and the Muscle does not expect the seamless continuation of an episode. Implementing this method is optional; if it is not implemented, nothing will happen on reset and the Muscle will also be kept as-is.
- setup()[source]
Generic setup method, called just before ::~Muscle.run
This method is called just before the main loop in ::~Muscle.run commences. It can be used for any setup tasks. The method is guranteed to be called in the same process as the main loop. Also, the communications link to the brain will already be established. However, there are no information about the environment available yet.
There is no need to load the muscle’s inference model here; refer to ::~Muscle.prepare_model for this.
- update(update: Any)[source]
Update the Muscle.
This method is called if the brain sends an update. What is to be updated is up to the specific implementation. However, this method should update all necessary components.
There might be implementations of
Brain
and Muscles where updates do not happen. Simple, static bots never learn, and, therefore, do not need a mechanism for updates. Therefore, the default implementation of this method is simply to not do anything.- Parameters:
update (any) – Any data that a
Brain
would send to its Muscles upon an update. Implementation-specific.
Brain#
- class palaestrai.agent.Brain[source]
Baseclass for all brain implementation
The brain is the central learning instance. It coordinates all muscles (if multiple muscles are available). The brain does all (deep) learning tasks and delivers a model to the muscles.
The brain has one abstract method
thinking()
that has to be implemented.Brain objects store their state and can re-load previous states by using the infrastructure provided by the
BrainDumper
infrastructure. For this, concrete Brain classes need to provide implementations ofload()
andstore()
.- property actuators: List[ActuatorInformation]
All actuators a Muscle can act with.
- load()[source]
Loads the current state of the model
This method is called whenever the current state of the brain should be restored. How a particular model is deserialized is up to the concrete implementation. Also, brains may be divided into sub-models (e.g., actor and critic), whose separate storage is relized via tags. Implementing this method allows for a versatile implementation of this.
It is advisable to use the storage facilities of palaestrAI. They are available through the method
_dumpers, tag)()
. This function calls all available dumpers to restore the serialized brain dump (optionally identified via atag
). It returns a BinaryIO object that can then be used in the implementation. The attribute ::~Brain._dumpers is initialized to the list of available dumpers/loaders.
- property memory: Memory
The Brain’s memory
- property seed: int
Returns the random seed applicable for this brain instance.
- property sensors: List[SensorInformation]
All sensors the Brain (and its Muscles) know about
- setup()[source]
Brain setup method
This method is called by the
AgentConductor
just before the main loop is intered (run()
). In the base Brain class, it is empty and does nothing. However, any derived class may implement it to do local setup before the main loop is entered.Potential tasks that could be done in this method is to set the size limit of the
Memory
via ::Memory.size_limit, or anything that needs to access the ::Brain.seed, ::Brain.sensors, or ::Brain.actuators, as they’re not yet available in the constructor.This method is guaranteed to be called in the same process space as the main loop method,
Brain.run()
.
- store()[source]
Stores the current state of the model
This method is called whenever the current state of the brain should be saved. How a particular model is serialized is up to the concrete implementation. Also, brains may be divided into sub-models (e.g., actor and critic), whose separate storage is relized via tags. Implementing this method allows for a versatile implementation of this.
It is advisable to use the storage facilities of palaestrAI. They are available through the method
_dumpers, tag)()
. This function calls all available dumpers to store the serialized brain dump provided in the parameterbinary_io
and optionally attaches atag
to it. The attribute ::~Brain._dumpers is initialized to a list of available dumpers and can be used directly.
- abstract thinking(muscle_id: str, data_from_muscle: Any) Any [source]
Think about a response using the provided information.
The
thinking()
method is the place for the implementation of the agent’s/brain’s logic. The brain can use the current sensor readings, review the actions of the previous thinking and consider the reward (provided by the objective).Usually, this is the place where machine learning happens, but other solutions are possible as well (like a set of rules or even random based results).
The method receives only the name of the
Muscle
that is sending data, along with whatever data thisMuscle
wants to send to the Brain. As this is completely implementation-specific, this method does not impose any restrictions.Any data that is available to palaestrAI, such as the actual sensor readings, setpoints a Muscle provided, rewards, the objective function’s value (goal/utility function), and whether the simulation is done or not, is available via the Brain’s
Memory
(cf. ::Brain.memory).