3. How to use trained agents externally¶

palaestrAI can train agents and store them. Here, we will show,

1.) how to load agents stored as files via the FileBrainDumper and

2.) how to use them for inference.

3.1. Installation¶

This notebook requires the modules palaestrai and harl (package palaestrai-agents) being installed.

Best, install them from the git development branch:

pip install palaestrai@git+https://gitlab.com/arl2/palaestrai.git@development#egg=palaestrai
pip install palaestrai-agents@git+https://gitlab.com/arl2/harl.git@development#egg=harl

3.2. Agent loading¶

The FileBrainDumper stores and load models relative with the current working directory to its base path, specified by RuntimeConfig().data_path. It defaults to ./_outputs unless you override it, as done below.

[1]:

from palaestrai.core import RuntimeConfig

RuntimeConfig().reset()
RuntimeConfig().load(
    {"data_path": "test_models"}
)

In palaestrAI the models are referenced by BrainLocations, that use the agent name and experiment run uid and phase to determine the location of the model learned by the brain. The FileBrainDumper searches the models for the brain using the BrainLocation under $CWD/$DATA_PATH/brains/$EXPERIMENT_RUN_UID/$EXPERIMENT_RUN_PHASE/${AGENT_NAME}-${TAG}.bin.

In the following, we load the models for a TD3-agent from a file. The harl:TD3Brain, or rather the harl:OffpolicyBrain, the base class for off-policy Deep Reinforcement Learning, serializes the model components (actor, critic and their targets) as individual binary blobs and store them, where the TAG being td3-{actor,critic,actor-target,actor-critic}. The actor model is the only model that is actually relevant for action inference in deployment, but we still require all components (and therefore files here), because we use the same interface for training when loading the models.

[2]:

from palaestrai.agent.file_brain_dumper import FileBrainDumper
from palaestrai.agent.brain_dumper import BrainLocation
import os

print(f"CWD: {os.getcwd()}")
model_dir = "./test_models/brains/test-experiment-run/0"
expected_file = "TD3-Gandalf-td3_actor.bin"

full_path = os.path.join(model_dir, expected_file)

if not os.path.exists(full_path):
    raise FileNotFoundError(
        f"Expected model file not found at {full_path}. "
        f"Available files in directory: {os.listdir(model_dir) if os.path.exists(model_dir) else 'Directory does not exist'}"
    )

print(f"Found model file at: {full_path}")


locator = BrainLocation(
    agent_name="TD3-Gandalf",
    experiment_run_uid="test-experiment-run",
    experiment_run_phase=0,
    repeat=1,
)

dumper = FileBrainDumper(dump_to=locator, load_from=locator)

CWD: /home/jonas/ARL/palaestrai/doc/tutorials
Found model file at: ./test_models/brains/test-experiment-run/0/TD3-Gandalf-td3_actor.bin

An ‘agent’ is composed of one brain and one (or many, identical) muscle(s) in palaestrAI. While the muscle is actually the relevant object for inference, we need the brain to load the model and then set in the muscle by the AgentFactory. Further, the AgentFactory needs to know which agent needs to be created, which is why an agent_definition need to be provided, that contains the path to the brain and muscle classes, which should be instantiated and into which the model should be loaded.

[3]:

from palaestrai.agent.agent_factory import AgentFactory

definition = {
    "name": "TD3-Gandalf",
    "brain": {"name": "harl.td3.brain:TD3Brain", "params": {}},
    "muscle": {"name": "harl.off_policy.muscle:OffPolicyMuscle",
               "params": {
                    "start_steps": 10000,
                    "noise": "harl:GaussianNoise",
                    "noise_mu": 0.0,
                    "noise_std": 0.1,
                    "noise_theta": 0.15,
                    "is_squashed": True,
    }},
    "objective": {
        "name": "palaestrai.agent.dummy_objective:DummyObjective",
        "params": {},
    },
}

factory = AgentFactory(agent_definition=definition, brain_dumpers=[dumper])
brain = factory.make_brain()
muscle = factory.make_muscle(brain=brain)

/home/jonas/ARL/harl/src/harl/sac/brain.py:62: SyntaxWarning: invalid escape sequence '\l'
  $$\theta_{\text{targ}} \leftarrow \rho \theta_{\text{targ}} + (
/home/jonas/ARL/harl/src/harl/td3/brain.py:40: SyntaxWarning: invalid escape sequence '\i'
  TDR regularization factor. In (0,1) and values of $\rho \in [0.3, 0.5, 0.7]$ were
/home/jonas/ARL/harl/src/harl/off_policy/brain.py:50: SyntaxWarning: invalid escape sequence '\l'
  $$\theta_{\text{targ}} \leftarrow (1-\tau) \theta_{\text{targ}} +

Bio: <_io.BytesIO object at 0x7f7f0c278540>
Bio: <_io.BytesIO object at 0x7f7f3ff62a70>
Bio: <_io.BytesIO object at 0x7f7f0c278590>
Bio: <_io.BytesIO object at 0x7f7f0c0f63e0>

3.3. Agent inference¶

Since the Agent is now loaded, we can use it for inference or further training. For example, we can use the muscle to the muscle for an action if given an observation. An example is shown below.

Since the muscle has been loaded with the model parameters, it should be able to propose actions based on the sensor readings and the actuator information. We will prepare some dummy sensor readings and actuator information to test the muscle’s action proposal. If the muscle proposes actions successfully, it means that the loading process worked correctly and that the model is ready for inference or further training.

[4]:

from palaestrai.agent.sensor_information import SensorInformation
from palaestrai.agent.actuator_information import ActuatorInformation
import numpy as np
from palaestrai.types import Box
# 7. Prepare sensors
sensor_readings = [1.0, 1.3, 2.0]
sensor_spaces = [
    Box(low=np.array([-np.inf]), high=np.array([np.inf])),  # s1
    Box(low=np.array([-np.inf]), high=np.array([np.inf])),  # s2
    Box(low=np.array([-np.inf]), high=np.array([np.inf])),  # s3
]
sensor_values = [np.array([val]) for val in sensor_readings]

sensors = [
    SensorInformation(uid="s1", value=sensor_values[0], space=sensor_spaces[0]),
    SensorInformation(uid="s2", value=sensor_values[1], space=sensor_spaces[1]),
    SensorInformation(uid="s3", value=sensor_values[2], space=sensor_spaces[2]),
]

# 8. Create actuator
actuator = ActuatorInformation(
    uid="main_actuator",
    space=Box(low=np.array([-1.0]), high=np.array([1.0]))
)

# 9. Get actions
try:
    actions, action_data = muscle.propose_actions(
        sensors=sensors,
        actuators_available=[actuator]
    )
    print("Actions proposed successfully:")
    print(f"Actions: {actions}")
    print(f"Action data: {action_data}")
except Exception as e:
    print(f"❌ Error proposing actions: {e}")
    print(f"Muscle model state: {muscle._model}")
    raise

Actions proposed successfully:
Actions: [ActuatorInformation(value=[-0.98351785], space=Box(-1.0, 1.0, (1,), float64), uid=main_actuator)]
Action data: True

Finally, we it might be useful to check if the loaded model parameters are identical to the original model parameters. This is a good way to ensure that the loading process worked correctly and that the model is ready for inference or further training.

[5]:

from harl.common.network import Actor
import torch.serialization
import torch as T

model_path = "./test_models/brains/test-experiment-run/0/TD3-Gandalf-td3_actor.bin"

torch.serialization.add_safe_globals([Actor])

loaded_model = T.load(model_path, map_location="cpu", weights_only=False)
# Ensure the loaded model is on CPU so comparisons don't fail if the
# saved tensors were on CUDA in the training environment.
try:
    loaded_model = loaded_model.to("cpu")
except Exception:
    # if loaded_model is not an nn.Module, fallback to mapping tensors
    pass

try:
    muscle._model = muscle._model.to("cpu")
except Exception:
    pass

for (k1, v1), (k2, v2) in zip(
        loaded_model.state_dict().items(),
        muscle._model.state_dict().items()
):
    if not T.allclose(v1, v2):
        print(f"Mismatch in {k1}")
        break
else:
    print("Models are identical")

Models are identical