Brain/Muscle API¶

Purpose of Brain and Muscles¶

In palaestrAI, agents are logically devided into two parts:

An executing part, mapping sensor inputs to actions
A learning algorithm

Different algorithms have different names for this concept; for example, A3C calls the muscles workers. The split allows agents to learn from different environment instances (or configurations) simultaneously, in an asynchronous fashion, or even to have more sophisticated setups. palaestrAI calls the learner the brain (palaestrai.agent.Brain). An agent has exactly one brain. The executing part is known as a muscle (palaestrai.agent.Muscle). An agent can have many, but at least one, muscle.

Agents are represented in an environment with the muscle(s). A muscle receives sensor readings, upon which it provides actions. From every muscle, the brain receives the muscle’s input, its output (actions), and the reward it received from these actions. The brain can then train on these information provided by its muscle(s). When it decides that training is complete, it updates one (or several, or all) muscles.

How to Add Algorithms to palaestrAI¶

In order to add a new algorithm, you can decide to implement the actor (muscle), the learner (brain), or both. The latter one is probably the most common one, as Deep Reinforcement Learning algorithms usually consider both, actor as well as learner.

The new muscle must implement two methods:

The propose_actions() method serves as an input-to-action mapper. For example, this is what the policy network in a forward-pass does in a Deep Reinforcement Learning setting. The second method, update(), gets called whenever the brain updates a muscle’s configuration, e.g., by providing new weights to the policy network. What the update() method’s parameter consists of is up to the implementation.

The brain needs only to implement palaestrai.agent.Brain.thinking(). It receives every muscle’s sensor inputs, action, and the reward gained from it. update() is the main entrypoint for every learner logic.

API Documentation¶

Muscle¶

class palaestrai.agent.Muscle(*args, **kwargs)[source]

An acting entity in an environment.

Each Muscle is an acting entity in an environment: Given a sensor input, it proposes actions. Thus, Muscles implement input-to-action mappings. A muscle does, however, not learn by itself; for that, it needs a Brain. Every time a Muscle acts, it sends the following inputs to a Brain:

Sensor inputs it received
actuator set points it provided
reward received from the proposed action.

When implementing an algorithm, you have to derive from the Muscle ABC and provide the following methods:

propose_actions(), which implements the input-to-action mapping
update(), which handles how updates from the Brain are incorporated into the muscle.

property memory: Memory

Muscle Memory.

Each Muscle can have its own, personal Memory. Internally, the memory stores sensor readings, actuator setpoints provided by the Muscle, as well as rewards from the environment and the result of the Muscle’s (i.e., Agent’s) objective function.

Returns:: The Muscle Memory.
Return type:: Memory

property mode: Mode

Internal mode of operations

Usually, an agent operates under the assumption of a certain modus operandi. This can be, for example, the distinction between training ( ::Mode.TRAIN) and testing (::Mode.TEST).

Returns:: The agent’s operations mode
Return type:: ::Mode

prepare_model()[source]

Loading a trained model for testing

This method loads dumped brain states from a given previous phase, or even experiment run. For details, see the documentation on experiment run files (the load key).

This method is called whenever the current state of a muscle model should be restored. How a particular model is deserialized is up to the concrete implementation. Also, brains may be divided into sub-models (e.g., actor and critic), whose separate storage is realized via tags. Implementing this method allows for a versatile implementation of this.

It is advisable to use the storage facilities of palaestrAI. These are available through ::Muscle.load. The model location has then been pre-set from the experiment run file.

abstract propose_actions(sensors: List[SensorInformation], actuators_available: List[ActuatorInformation]) → Tuple[List[ActuatorInformation], Any][source]

Process new sensor information and produce actuator setpoints.

This method provides the essential inference task of the Muscle: It takes current sensor information and is expected to produce a list of actuator setpoints that can be applied in the ::Environment. How the actuator values are produced and how the sensor information are processed is up to the developer.

This is the essential abstract method that needs to be implemented by every Muscle.

Sensor readings and the list of available actuators are valid for the current time. Previous sensor readings, rewards, and objective value can be retrieved from the Muscle’s ::Memory, which is accessible through the ::Muscle.memory property.

Parameters:

sensors (list of SensorInformation) – List of new SensorInformation for all available sensors
actuators_available (list of ActuatorInformation) – List of all actuators that are currently available to the agent

Returns:

A Tuple containing: (1) The actual setpoints (an list of ::ActuatorSetpoint objects), for which it is allowed to simply use the objects that are passed as parameters, deep-copying is not necessary; (2) any other data that should be sent to the Muscle’s ::Brain.

Return type:

tuple of two elements

reset()[source]

Called in order to reset the Muscle.

There is a number of occasions in which the Muscle should stay active, but reset. For example, when a new episode of the same experiment run phase is started. Then, the Muscle is allowed (or better, encouraged) to keep its state, but acknowledge that a reset has occured and the Muscle does not expect the seamless continuation of an episode. Implementing this method is optional; if it is not implemented, nothing will happen on reset and the Muscle will also be kept as-is.

setup()[source]

Generic setup method, called just before ::~Muscle.run

This method is called just before the main loop in ::~Muscle.run commences. It can be used for any setup tasks. The method is guranteed to be called in the same process as the main loop. Also, the communications link to the brain will already be established. However, there are no information about the environment available yet.

There is no need to load the muscle’s inference model here; refer to ::~Muscle.prepare_model for this.

update(update: Any)[source]

Update the Muscle.

This method is called if the brain sends an update. What is to be updated is up to the specific implementation. However, this method should update all necessary components.

There might be implementations of Brain and Muscles where updates do not happen. Simple, static bots never learn, and, therefore, do not need a mechanism for updates. Therefore, the default implementation of this method is simply to not do anything.

Parameters:: update (any) – Any data that a Brain would send to its Muscles upon an update. Implementation-specific.

Brain¶

class palaestrai.agent.Brain[source]

Baseclass for all brain implementation

The brain is the central learning instance. It coordinates all muscles (if multiple muscles are available). The brain does all (deep) learning tasks and delivers a model to the muscles.

The brain has one abstract method thinking() that has to be implemented.

Brain objects store their state and can re-load previous states by using the infrastructure provided by the BrainDumper infrastructure. For this, concrete Brain classes need to provide implementations of load() and store().

property actuators: List[ActuatorInformation]: All actuators a Muscle can act with.

add_statistics(key: str, value: Any, allow_overwrite=False)[source]

Add statistics for later analysis

Each Brain can have its own statistic metrics, which are calculated for a step (i.e., a call to ::~thinking). This can be used to store training error or any other value you’d later want to analyze

load()[source]

Loads the current state of the model

This method is called whenever the current state of the brain should be restored. How a particular model is deserialized is up to the concrete implementation. Also, brains may be divided into sub-models (e.g., actor and critic), whose separate storage is relized via tags. Implementing this method allows for a versatile implementation of this.

It is advisable to use the storage facilities of palaestrAI. They are available through the method _dumpers, tag)(). This function calls all available dumpers to restore the serialized brain dump (optionally identified via a tag). It returns a BinaryIO object that can then be used in the implementation. The attribute ::~Brain._dumpers is initialized to the list of available dumpers/loaders.

property memory: Memory: The Brain’s memory

property name: str: Name of this agent

pop_statistics() → Dict[str, Any][source]

Returning current statistics and resetting it

This method returns the statistics dict and clears it afterwards.

Because the statistics dict should contain metrics that refer to one step, it is stored and cleared after each one.

Returns:: The dict contains a mapping of metric keys to values. This dynamically allows various implementation-dependent statistics metrics.
Return type:: Dict

pretrain()[source]

Pretrains the Brain before an experiment begins (Offline Learning)

The Learner will have filled the Brain’s Memory with previous trajectories it can access. These trajectories are loaded according to the replay key in the experiment run file. How the pretraining (in DRL parley, known as Offline Learning) is actually conducted is up to the algorithm. I.e., concrete algorithm implementations may overwrite this method if they wish to implement pretraining/offline training.

property seed: int: Returns the random seed applicable for this brain instance.

property sensors: List[SensorInformation]: All sensors the Brain (and its Muscles) know about

setup()[source]

Brain setup method

This method is called by the AgentConductor just before the main loop is intered (run()). In the base Brain class, it is empty and does nothing. However, any derived class may implement it to do local setup before the main loop is entered.

Potential tasks that could be done in this method is to set the size limit of the Memory via ::Memory.size_limit, or anything that needs to access the ::Brain.seed, ::Brain.sensors, or ::Brain.actuators, as they’re not yet available in the constructor.

This method is guaranteed to be called in the same process space as the main loop method, Brain.run().

store()[source]

Stores the current state of the model

This method is called whenever the current state of the brain should be saved. How a particular model is serialized is up to the concrete implementation. Also, brains may be divided into sub-models (e.g., actor and critic), whose separate storage is relized via tags. Implementing this method allows for a versatile implementation of this.

It is advisable to use the storage facilities of palaestrAI. They are available through the method _dumpers, tag)(). This function calls all available dumpers to store the serialized brain dump provided in the parameter binary_io and optionally attaches a tag to it. The attribute ::~Brain._dumpers is initialized to a list of available dumpers and can be used directly.

abstract thinking(muscle_id: str, data_from_muscle: Any) → Any[source]

Think about a response using the provided information.

The thinking() method is the place for the implementation of the agent’s/brain’s logic. The brain can use the current sensor readings, review the actions of the previous thinking and consider the reward (provided by the objective).

Usually, this is the place where machine learning happens, but other solutions are possible as well (like a set of rules or even random based results).

The method receives only the name of the Muscle that is sending data, along with whatever data this Muscle wants to send to the Brain. As this is completely implementation-specific, this method does not impose any restrictions.

Any data that is available to palaestrAI, such as the actual sensor readings, setpoints a Muscle provided, rewards, the objective function’s value (goal/utility function), and whether the simulation is done or not, is available via the Brain’s Memory (cf. ::Brain.memory).

Parameters:

muscle_id (str) – This is the ID of the muscle which requested the update
data_from_muscle (Any) – Any data the Muscle sends to the Brain

Returns:

Any update that the Muscle. If this value does not evaluate to True (i.e., bool(update) == False), then the Muscle will not be updated.

Return type:

Any