******************************* Running palaestrAI as a Service ******************************* About ----- Usually, experiments are executed by issuing ``palaestrai experiment-start`` on the machine that runs palaestrAI. ``palaestrai serve`` adds a second way to drive palaestrAI: it launches a long-running HTTP service that exposes a REST API. External tools (and, later, a web frontend) can then create and launch experiment runs, poll their status, and retrieve their logs over HTTP, without having to run palaestrAI locally or copy experiment files around. The service stays online until it receives ``SIGINT`` or ``SIGTERM`` (e.g., ``Ctrl+C``); there is no separate daemon management. Quickstart ---------- Start the server (here on the default port ``4247``, listening on all interfaces):: palaestrai serve The following ``curl`` session walks through a full lifecycle. It assumes an experiment run document ``my_run.yml`` (the same document format accepted by ``palaestrai experiment-start``). Create the experiment run in the database (this does **not** run it). The identifier is read from the document's ``uid`` field:: curl -X PUT http://localhost:4247/experiment_runs \ -H "Content-Type: application/x-yaml" \ --data-binary @my_run.yml Launch a new *instance* of that run. The request body is ignored; the response contains the new instance UID:: curl -X PUT http://localhost:4247/experiment_runs//instances {"instance_uid": "", "experiment_run_uid": ""} Poll the instance status. The HTTP status code mirrors the lifecycle state (``202`` scheduled, ``200`` running/finished, ``500`` error, ``404`` unknown):: curl -i http://localhost:4247/experiment_run_instances/ {"uid": "", "status": "RUNNING"} Retrieve the persisted log entries for that instance (see :ref:`serve-logging` below for the available filters):: curl "http://localhost:4247/experiment_run_instances//logs?level=INFO" Shut the service down by sending ``SIGINT``/``SIGTERM`` to the ``palaestrai serve`` process (e.g. ``Ctrl+C``). The parent asks the executor child to shut down gracefully, waits, and then exits. Parameters ---------- CLI options ^^^^^^^^^^^ ``palaestrai serve`` accepts the following options: ``-l``, ``--listen`` IP address to bind to. May be given multiple times to request several addresses. Default: all available interfaces (``0.0.0.0``). Because uvicorn binds a single host, requesting more than one explicit address falls back to binding all interfaces. ``-p``, ``--port`` TCP port to listen on. Default: ``4247``. Example:: palaestrai serve --listen 127.0.0.1 --port 8080 Runtime-config keys ^^^^^^^^^^^^^^^^^^^ Beyond the CLI options, ``palaestrai serve`` reads the usual :doc:`runtime configuration `. The keys most relevant to the service are: ``store_uri`` SQLAlchemy-style URI of the results store database that every route reads from and writes to. The service ensures the schema exists on startup. Default: ``sqlite:///palaestrai.db``. ``log_store_uri`` SQLAlchemy-style URI of the *separate* SQLite database that holds the log entries served by ``GET /experiment_run_instances/{uid}/logs`` (see :ref:`serve-logging`). It is kept separate from ``store_uri`` so logs do not bloat the results store. The executor child writes it and the API parent reads it, so both sides must agree on this value (which they do, as the parent snapshots its runtime config for the child). Default: ``sqlite:///palaestrai-log.db``. ``broker_uri`` / ``executor_bus_port`` / ``public_bind`` Control the in-palaestrAI ZeroMQ message broker the executor child uses internally. They behave exactly as for ``palaestrai experiment-start``; see :doc:`runtime-config`. Defaults: ``broker_uri`` derived from the other two, ``executor_bus_port: 4242``, ``public_bind: False``. ``fork_method`` Multiprocessing start method. The supervisor pins the executor subtree to this method so its synchronization primitives stay consistent. Default: ``spawn``. .. note:: The HTTP listen address/port are **not** runtime-config keys; they are set only via the ``--listen``/``--port`` CLI options above. .. _serve-logging: Logging ^^^^^^^ Under ``palaestrai serve`` the separate log store is **enabled automatically**: the executor child attaches a :class:`~palaestrai.util.sqlite_log_handler.SQLiteLogHandler` to the root logger, writing to ``log_store_uri``. A few properties follow from how the handler works: * The log store is **optional** in general (it is specific to ``serve``) but is switched on by ``serve`` without any extra configuration. * Only records that are associated with an **experiment run instance** are stored; every record is keyed by its ``experiment_run_instance_uid``. Records not tied to an instance (executor/broker/pre-run records) are dropped from the log store and only reach stdout. * ``DEBUG`` records are **dropped by default** (the handler stores ``INFO`` and above). Lower the relevant logger levels in the runtime config if you need finer-grained logs. REST API -------- All routes negotiate content: a response is returned as YAML when the client sends an ``Accept`` header asking for YAML (``application/x-yaml``, ``application/yaml``, ``text/yaml``, ``text/x-yaml``) and as JSON otherwise. The ``PUT`` routes default to parsing the request body as YAML; because JSON is a subset of YAML, a JSON body is accepted as well. ``GET /experiments/{name}`` always returns the stored YAML document. Error responses follow a thin error→HTTP translation: not-found lookups become ``404``, integrity/uniqueness violations become ``409``, schema/syntax validation failures become ``422``, immutability violations become ``405`` (with an explanatory message), and any other unhandled error becomes ``500``. Experiments ^^^^^^^^^^^ .. code-block:: text GET /experiments List all experiments, each including its experiment runs under the ``experiment_runs`` key. Optional query parameter ``name`` filters by name (SQL ``LIKE`` syntax). Returns ``200`` with a JSON (or YAML) list. .. code-block:: text GET /experiments/{name} Return the stored experiment document (and nothing else) as YAML. ``200`` on success, ``404`` if no such experiment exists. .. code-block:: text PUT /experiments Store a new experiment from its (arsenAI) document body. The experiment name is taken from the document's top-level ``uid`` field; a missing ``uid`` is a ``405``. Returns ``201`` with ``{"name": ..., "id": ...}``. A duplicate name surfaces as ``409``. .. code-block:: text DELETE /experiments/{name} Cascading delete of an experiment and everything below it (runs, instances, phases, ...). ``200`` with ``{"deleted": name}``; ``404`` if unknown. .. code-block:: text POST /experiments/{name} Experiments are immutable, so this **always** returns ``405`` with the message "Experiments are immutable. ...". Experiment runs ^^^^^^^^^^^^^^^ .. code-block:: text GET /experiment_runs List all experiment runs; each entry adds an ``experiment`` key with the parent experiment's name. Optional query parameter ``uid`` filters by run UID (SQL ``LIKE`` syntax). Returns ``200``. .. code-block:: text GET /experiment_runs/{uid} Return full data on an experiment run, travelling down the hierarchy to its instances and their phases (environments and agents), but **not** down to muscle actions, environment/world states, or brain dumps. ``200`` on success, ``404`` if unknown. .. code-block:: text PUT /experiment_runs Create a new experiment run in the database from its YAML body (does not run it). The run UID and the parent experiment association (``experiment_uid``) are read from the document; the body is schema-validated. Returns ``201`` with ``{"uid": ..., "experiment_uid": ...}``. A schema/syntax failure is ``422``. .. code-block:: text POST /experiment_runs/{uid} Update an experiment run, but **only** if it has never been executed (no instance exists yet). If it has been executed at least once, this returns ``405``; create a new run instead. ``200`` with ``{"uid": ...}`` on success, ``404`` if unknown. .. code-block:: text DELETE /experiment_runs/{uid} Cascading delete of the run and all data associated with it. ``200`` with ``{"deleted": uid}``; ``404`` if unknown. .. code-block:: text PUT /experiment_runs/{uid}/instances Schedule a new instance of the experiment run. The request body is ignored. The run is reconstructed (its instance UID is generated at construction) and handed to the executor child for scheduling; the instance row itself is created asynchronously by the store under the same UID. Returns ``202`` with ``{"instance_uid": ..., "experiment_run_uid": uid}`` immediately, ``404`` if the run is unknown. Experiment run instances ^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: text GET /experiment_run_instances/{uid} Report the status of an instance. The body is ``{"uid": ..., "status": ...}`` and the HTTP status code mirrors the lifecycle state: .. list-table:: :header-rows: 1 * - Status - HTTP code * - ``SCHEDULED`` - ``202`` * - ``RUNNING`` - ``200`` * - ``FINISHED`` - ``200`` * - ``ERROR`` - ``500`` * - ``UNKNOWN`` - ``404`` ``UNKNOWN`` is reported (with ``404``) when no instance row exists for the UID. .. code-block:: text GET /experiment_run_instances/{uid}/logs Return the persisted log entries for an instance, read from the separate log store (``log_store_uri``). Existence of the instance is checked against the results store: an unknown instance is a ``404`` (body ``{"uid": ..., "status": "UNKNOWN"}``); a known instance with no logs yields a ``200`` with an empty list. The response body is:: {"instance_uid": ..., "count": , "logs": [ ... ]} where each log entry is ``{"created_at", "level", "levelno", "logger", "message"}``. The following query parameters filter and page the result: ``level`` Minimum log level by name (e.g. ``INFO``, ``WARNING``). Entries with a numeric level **at or above** this level are returned. An unknown level name is a ``422``. ``since`` ISO-8601 timestamp; only entries at or after this time are returned. A value that is not a valid ISO-8601 timestamp is a ``422``. .. warning:: URL-encode the timestamp. A ``+`` in a raw query string decodes to a space, so an offset like ``+00:00`` must be encoded (``%2B00:00``); otherwise the value fails validation and the request is rejected with ``422``. ``logger`` Filter by logger name (SQL ``LIKE`` syntax). ``limit`` Maximum number of entries to return. Default ``1000``; clamped to a maximum of ``10000``. ``offset`` Number of entries to skip (for paging). Default ``0``. Entries are ordered by ``created_at`` ascending, then by insertion order. Process Model ------------- A single ``palaestrai serve`` invocation runs **two processes**: 1. **API process (parent).** Runs `uvicorn `_ together with a `FastAPI `_ application. It owns the HTTP socket(s) and the user-facing signal handlers. All read routes (``GET``) and all create/delete routes (``PUT``/``DELETE``/``POST``) talk to the results store database directly -- these are pure store operations that do not need a running executor. 2. **Executor process (child).** Runs ``Executor(is_service=True).execute()`` in its own event loop. In service mode the executor does not shut down when no run is scheduled; it idles instead, waiting for work. It owns the experiment-run-instance status lifecycle (``SCHEDULED`` → ``RUNNING`` → ``FINISHED``/``ERROR``), which it writes authoritatively to the database. The parent supervises the child over a small command channel (a ``multiprocessing`` pipe). Scheduling a new instance (``PUT /experiment_runs/{uid}/instances``) sends a ``SCHEDULE`` command to the child; everything else is served from the database. If the executor child dies unexpectedly, the parent marks any still-``RUNNING`` instance as ``ERROR`` (crash safety), cleans up orphaned processes, and respawns the child.