{ "cells": [ { "metadata": {}, "cell_type": "markdown", "source": [ "# How to use trained agents externally\n", "\n", "palaestrAI can train agents and store them. Here, we will show,\n", "\n", "1.) how to load agents stored as files via the `FileBrainDumper` and\n", "\n", "2.) how to use them for inference.\n" ], "id": "6169f97b87add5b3" }, { "metadata": {}, "cell_type": "markdown", "source": [ "## Installation\n", "This notebook requires the modules `palaestrai` and `harl` (package `palaestrai-agents`) being installed.\n", "\n", "Best, install them from the git `development` branch:\n", "\n", "```sh\n", "pip install palaestrai@git+https://gitlab.com/arl2/palaestrai.git@development#egg=palaestrai\n", "pip install palaestrai-agents@git+https://gitlab.com/arl2/harl.git@development#egg=harl\n", "```" ], "id": "414b9eb26db63683" }, { "metadata": {}, "cell_type": "markdown", "source": [ "## Agent loading\n", "The `FileBrainDumper` stores and load models relative with the current working directory to its base path, specified by `RuntimeConfig().data_path`. It defaults to `./_outputs` unless you override it, as done below." ], "id": "ce948a288a8d47b5" }, { "metadata": { "ExecuteTime": { "end_time": "2026-05-27T11:12:47.423511429Z", "start_time": "2026-05-27T11:12:47.226912507Z" } }, "cell_type": "code", "source": [ "from palaestrai.core import RuntimeConfig\n", "\n", "RuntimeConfig().reset()\n", "RuntimeConfig().load(\n", " {\"data_path\": \"test_models\"}\n", ")" ], "id": "d29f7137691024f7", "outputs": [], "execution_count": 1 }, { "metadata": {}, "cell_type": "markdown", "source": [ "In palaestrAI the models are referenced by `BrainLocations`, that use the agent name and experiment run uid and phase to determine the location of the model learned by the brain.\n", "The `FileBrainDumper` searches the models for the brain using the `BrainLocation` under `$CWD/$DATA_PATH/brains/$EXPERIMENT_RUN_UID/$EXPERIMENT_RUN_PHASE/${AGENT_NAME}-${TAG}.bin`.\n", "\n", "In the following, we load the models for a TD3-agent from a file.\n", "The `harl:TD3Brain`, or rather the `harl:OffpolicyBrain`, the base class for off-policy Deep Reinforcement Learning, serializes the model components (actor, critic and their targets) as individual binary blobs and store them, where the `TAG` being `td3-{actor,critic,actor-target,actor-critic}`. The `actor` model is the only model that is actually relevant for action inference in deployment, but we still require all components (and therefore files here), because we use the same interface for training when loading the models.\n" ], "id": "b0303ca242a9c50c" }, { "metadata": { "ExecuteTime": { "end_time": "2026-05-27T11:12:47.450282800Z", "start_time": "2026-05-27T11:12:47.428010150Z" } }, "cell_type": "code", "source": [ "from palaestrai.agent.file_brain_dumper import FileBrainDumper\n", "from palaestrai.agent.brain_dumper import BrainLocation\n", "import os\n", "\n", "print(f\"CWD: {os.getcwd()}\")\n", "model_dir = \"./test_models/brains/test-experiment-run/0\"\n", "expected_file = \"TD3-Gandalf-td3_actor.bin\"\n", "\n", "full_path = os.path.join(model_dir, expected_file)\n", "\n", "if not os.path.exists(full_path):\n", " raise FileNotFoundError(\n", " f\"Expected model file not found at {full_path}. \"\n", " f\"Available files in directory: {os.listdir(model_dir) if os.path.exists(model_dir) else 'Directory does not exist'}\"\n", " )\n", "\n", "print(f\"Found model file at: {full_path}\")\n", "\n", "\n", "locator = BrainLocation(\n", " agent_name=\"TD3-Gandalf\",\n", " experiment_run_uid=\"test-experiment-run\",\n", " experiment_run_phase=0,\n", " repeat=1,\n", ")\n", "\n", "dumper = FileBrainDumper(dump_to=locator, load_from=locator)\n" ], "id": "c4d0f370b04e7a63", "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CWD: /home/jonas/ARL/palaestrai/doc/tutorials\n", "Found model file at: ./test_models/brains/test-experiment-run/0/TD3-Gandalf-td3_actor.bin\n" ] } ], "execution_count": 2 }, { "metadata": {}, "cell_type": "markdown", "source": [ "An 'agent' is composed of one brain and one (or many, identical) muscle(s) in palaestrAI. While the muscle is actually the relevant object for inference, we need the brain to load the model and then set in the muscle by the `AgentFactory`.\n", "Further, the `AgentFactory` needs to know which agent needs to be created, which is why an `agent_definition` need to be provided, that contains the path to the brain and muscle classes, which should be instantiated and into which the model should be loaded.\n" ], "id": "b5f0d821ca14f2dd" }, { "metadata": { "ExecuteTime": { "end_time": "2026-05-27T11:12:48.912987029Z", "start_time": "2026-05-27T11:12:47.458794905Z" } }, "cell_type": "code", "source": [ "from palaestrai.agent.agent_factory import AgentFactory\n", "\n", "definition = {\n", " \"name\": \"TD3-Gandalf\",\n", " \"brain\": {\"name\": \"harl.td3.brain:TD3Brain\", \"params\": {}},\n", " \"muscle\": {\"name\": \"harl.off_policy.muscle:OffPolicyMuscle\",\n", " \"params\": {\n", " \"start_steps\": 10000,\n", " \"noise\": \"harl:GaussianNoise\",\n", " \"noise_mu\": 0.0,\n", " \"noise_std\": 0.1,\n", " \"noise_theta\": 0.15,\n", " \"is_squashed\": True,\n", " }},\n", " \"objective\": {\n", " \"name\": \"palaestrai.agent.dummy_objective:DummyObjective\",\n", " \"params\": {},\n", " },\n", "}\n", "\n", "factory = AgentFactory(agent_definition=definition, brain_dumpers=[dumper])\n", "brain = factory.make_brain()\n", "muscle = factory.make_muscle(brain=brain)\n", "\n" ], "id": "a17ff3e446412b8", "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/jonas/ARL/harl/src/harl/sac/brain.py:62: SyntaxWarning: invalid escape sequence '\\l'\n", " $$\\theta_{\\text{targ}} \\leftarrow \\rho \\theta_{\\text{targ}} + (\n", "/home/jonas/ARL/harl/src/harl/td3/brain.py:40: SyntaxWarning: invalid escape sequence '\\i'\n", " TDR regularization factor. In (0,1) and values of $\\rho \\in [0.3, 0.5, 0.7]$ were\n", "/home/jonas/ARL/harl/src/harl/off_policy/brain.py:50: SyntaxWarning: invalid escape sequence '\\l'\n", " $$\\theta_{\\text{targ}} \\leftarrow (1-\\tau) \\theta_{\\text{targ}} +\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Bio: <_io.BytesIO object at 0x7f7f0c278540>\n", "Bio: <_io.BytesIO object at 0x7f7f3ff62a70>\n", "Bio: <_io.BytesIO object at 0x7f7f0c278590>\n", "Bio: <_io.BytesIO object at 0x7f7f0c0f63e0>\n" ] } ], "execution_count": 3 }, { "metadata": {}, "cell_type": "markdown", "source": [ "## Agent inference\n", "\n", "Since the Agent is now loaded, we can use it for inference or further training. For example, we can use the muscle to the muscle for an action if given an observation. An example is shown below.\n", "\n", "Since the muscle has been loaded with the model parameters, it should be able to propose actions based on the sensor readings and the actuator information. We will prepare some dummy sensor readings and actuator information to test the muscle's action proposal. If the muscle proposes actions successfully, it means that the loading process worked correctly and that the model is ready for inference or further training." ], "id": "1a54728b38a5c8e6" }, { "metadata": { "ExecuteTime": { "end_time": "2026-05-27T11:12:48.997592166Z", "start_time": "2026-05-27T11:12:48.940063456Z" } }, "cell_type": "code", "source": [ "from palaestrai.agent.sensor_information import SensorInformation\n", "from palaestrai.agent.actuator_information import ActuatorInformation\n", "import numpy as np\n", "from palaestrai.types import Box\n", "# 7. Prepare sensors\n", "sensor_readings = [1.0, 1.3, 2.0]\n", "sensor_spaces = [\n", " Box(low=np.array([-np.inf]), high=np.array([np.inf])), # s1\n", " Box(low=np.array([-np.inf]), high=np.array([np.inf])), # s2\n", " Box(low=np.array([-np.inf]), high=np.array([np.inf])), # s3\n", "]\n", "sensor_values = [np.array([val]) for val in sensor_readings]\n", "\n", "sensors = [\n", " SensorInformation(uid=\"s1\", value=sensor_values[0], space=sensor_spaces[0]),\n", " SensorInformation(uid=\"s2\", value=sensor_values[1], space=sensor_spaces[1]),\n", " SensorInformation(uid=\"s3\", value=sensor_values[2], space=sensor_spaces[2]),\n", "]\n", "\n", "# 8. Create actuator\n", "actuator = ActuatorInformation(\n", " uid=\"main_actuator\",\n", " space=Box(low=np.array([-1.0]), high=np.array([1.0]))\n", ")\n", "\n", "# 9. Get actions\n", "try:\n", " actions, action_data = muscle.propose_actions(\n", " sensors=sensors,\n", " actuators_available=[actuator]\n", " )\n", " print(\"Actions proposed successfully:\")\n", " print(f\"Actions: {actions}\")\n", " print(f\"Action data: {action_data}\")\n", "except Exception as e:\n", " print(f\"❌ Error proposing actions: {e}\")\n", " print(f\"Muscle model state: {muscle._model}\")\n", " raise\n" ], "id": "63e991f401a0e855", "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Actions proposed successfully:\n", "Actions: [ActuatorInformation(value=[-0.98351785], space=Box(-1.0, 1.0, (1,), float64), uid=main_actuator)]\n", "Action data: True\n" ] } ], "execution_count": 4 }, { "metadata": {}, "cell_type": "markdown", "source": "Finally, we it might be useful to check if the loaded model parameters are identical to the original model parameters. This is a good way to ensure that the loading process worked correctly and that the model is ready for inference or further training.", "id": "4fe91adac803a4f7" }, { "metadata": { "ExecuteTime": { "end_time": "2026-05-27T11:12:49.030807428Z", "start_time": "2026-05-27T11:12:49.006023001Z" } }, "cell_type": "code", "source": [ "from harl.common.network import Actor\n", "import torch.serialization\n", "import torch as T\n", "\n", "model_path = \"./test_models/brains/test-experiment-run/0/TD3-Gandalf-td3_actor.bin\"\n", "\n", "torch.serialization.add_safe_globals([Actor])\n", "\n", "loaded_model = T.load(model_path, map_location=\"cpu\", weights_only=False)\n", "# Ensure the loaded model is on CPU so comparisons don't fail if the\n", "# saved tensors were on CUDA in the training environment.\n", "try:\n", " loaded_model = loaded_model.to(\"cpu\")\n", "except Exception:\n", " # if loaded_model is not an nn.Module, fallback to mapping tensors\n", " pass\n", "\n", "try:\n", " muscle._model = muscle._model.to(\"cpu\")\n", "except Exception:\n", " pass\n", "\n", "for (k1, v1), (k2, v2) in zip(\n", " loaded_model.state_dict().items(),\n", " muscle._model.state_dict().items()\n", "):\n", " if not T.allclose(v1, v2):\n", " print(f\"Mismatch in {k1}\")\n", " break\n", "else:\n", " print(\"Models are identical\")\n", "\n" ], "id": "2b14f741c0684b0b", "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Models are identical\n" ] } ], "execution_count": 5 } ], "metadata": { "kernelspec": { "name": "python3", "language": "python", "display_name": "Python 3 (ipykernel)" } }, "nbformat": 4, "nbformat_minor": 5 }