Running palaestrAI as a Service¶
About¶
Usually, experiments are executed by issuing palaestrai experiment-start
on the machine that runs palaestrAI. palaestrai serve adds a second way to
drive palaestrAI: it launches a long-running HTTP service that exposes a REST
API. External tools (and, later, a web frontend) can then create and launch
experiment runs, poll their status, and retrieve their logs over HTTP, without
having to run palaestrAI locally or copy experiment files around.
The service stays online until it receives SIGINT or SIGTERM (e.g.,
Ctrl+C); there is no separate daemon management.
Quickstart¶
Start the server (here on the default port 4247, listening on all
interfaces):
palaestrai serve
The following curl session walks through a full lifecycle. It assumes an
experiment run document my_run.yml (the same document format accepted by
palaestrai experiment-start).
Create the experiment run in the database (this does not run it). The
identifier is read from the document’s uid field:
curl -X PUT http://localhost:4247/experiment_runs \
-H "Content-Type: application/x-yaml" \
--data-binary @my_run.yml
Launch a new instance of that run. The request body is ignored; the response contains the new instance UID:
curl -X PUT http://localhost:4247/experiment_runs/<run-uid>/instances
{"instance_uid": "<instance-uid>", "experiment_run_uid": "<run-uid>"}
Poll the instance status. The HTTP status code mirrors the lifecycle state
(202 scheduled, 200 running/finished, 500 error, 404 unknown):
curl -i http://localhost:4247/experiment_run_instances/<instance-uid>
{"uid": "<instance-uid>", "status": "RUNNING"}
Retrieve the persisted log entries for that instance (see Logging below for the available filters):
curl "http://localhost:4247/experiment_run_instances/<instance-uid>/logs?level=INFO"
Shut the service down by sending SIGINT/SIGTERM to the
palaestrai serve process (e.g. Ctrl+C). The parent asks the executor
child to shut down gracefully, waits, and then exits.
Parameters¶
CLI options¶
palaestrai serve accepts the following options:
-l,--listenIP address to bind to. May be given multiple times to request several addresses. Default: all available interfaces (
0.0.0.0). Because uvicorn binds a single host, requesting more than one explicit address falls back to binding all interfaces.-p,--portTCP port to listen on. Default:
4247.
Example:
palaestrai serve --listen 127.0.0.1 --port 8080
Runtime-config keys¶
Beyond the CLI options, palaestrai serve reads the usual
runtime configuration. The keys most relevant to the
service are:
store_uriSQLAlchemy-style URI of the results store database that every route reads from and writes to. The service ensures the schema exists on startup. Default:
sqlite:///palaestrai.db.log_store_uriSQLAlchemy-style URI of the separate SQLite database that holds the log entries served by
GET /experiment_run_instances/{uid}/logs(see Logging). It is kept separate fromstore_uriso logs do not bloat the results store. The executor child writes it and the API parent reads it, so both sides must agree on this value (which they do, as the parent snapshots its runtime config for the child). Default:sqlite:///palaestrai-log.db.broker_uri/executor_bus_port/public_bindControl the in-palaestrAI ZeroMQ message broker the executor child uses internally. They behave exactly as for
palaestrai experiment-start; see Runtime Configuration. Defaults:broker_uriderived from the other two,executor_bus_port: 4242,public_bind: False.fork_methodMultiprocessing start method. The supervisor pins the executor subtree to this method so its synchronization primitives stay consistent. Default:
spawn.
Note
The HTTP listen address/port are not runtime-config keys; they are set
only via the --listen/--port CLI options above.
Logging¶
Under palaestrai serve the separate log store is enabled
automatically: the executor child attaches a
SQLiteLogHandler to the root
logger, writing to log_store_uri. A few properties follow from how the
handler works:
The log store is optional in general (it is specific to
serve) but is switched on byservewithout any extra configuration.Only records that are associated with an experiment run instance are stored; every record is keyed by its
experiment_run_instance_uid. Records not tied to an instance (executor/broker/pre-run records) are dropped from the log store and only reach stdout.DEBUGrecords are dropped by default (the handler storesINFOand above). Lower the relevant logger levels in the runtime config if you need finer-grained logs.
REST API¶
All routes negotiate content: a response is returned as YAML when the client
sends an Accept header asking for YAML (application/x-yaml,
application/yaml, text/yaml, text/x-yaml) and as JSON otherwise.
The PUT routes default to parsing the request body as YAML; because JSON is
a subset of YAML, a JSON body is accepted as well. GET /experiments/{name}
always returns the stored YAML document.
Error responses follow a thin error→HTTP translation: not-found lookups become
404, integrity/uniqueness violations become 409, schema/syntax
validation failures become 422, immutability violations become 405 (with
an explanatory message), and any other unhandled error becomes 500.
Experiments¶
GET /experiments
List all experiments, each including its experiment runs under the
experiment_runs key. Optional query parameter name filters by name
(SQL LIKE syntax). Returns 200 with a JSON (or YAML) list.
GET /experiments/{name}
Return the stored experiment document (and nothing else) as YAML. 200 on
success, 404 if no such experiment exists.
PUT /experiments
Store a new experiment from its (arsenAI) document body. The experiment name
is taken from the document’s top-level uid field; a missing uid is a
405. Returns 201 with {"name": ..., "id": ...}. A duplicate name
surfaces as 409.
DELETE /experiments/{name}
Cascading delete of an experiment and everything below it (runs, instances,
phases, …). 200 with {"deleted": name}; 404 if unknown.
POST /experiments/{name}
Experiments are immutable, so this always returns 405 with the message
“Experiments are immutable. …”.
Experiment runs¶
GET /experiment_runs
List all experiment runs; each entry adds an experiment key with the parent
experiment’s name. Optional query parameter uid filters by run UID (SQL
LIKE syntax). Returns 200.
GET /experiment_runs/{uid}
Return full data on an experiment run, travelling down the hierarchy to its
instances and their phases (environments and agents), but not down to
muscle actions, environment/world states, or brain dumps. 200 on success,
404 if unknown.
PUT /experiment_runs
Create a new experiment run in the database from its YAML body (does not run
it). The run UID and the parent experiment association (experiment_uid) are
read from the document; the body is schema-validated. Returns 201 with
{"uid": ..., "experiment_uid": ...}. A schema/syntax failure is 422.
POST /experiment_runs/{uid}
Update an experiment run, but only if it has never been executed (no
instance exists yet). If it has been executed at least once, this returns
405; create a new run instead. 200 with {"uid": ...} on success,
404 if unknown.
DELETE /experiment_runs/{uid}
Cascading delete of the run and all data associated with it. 200 with
{"deleted": uid}; 404 if unknown.
PUT /experiment_runs/{uid}/instances
Schedule a new instance of the experiment run. The request body is ignored.
The run is reconstructed (its instance UID is generated at construction) and
handed to the executor child for scheduling; the instance row itself is created
asynchronously by the store under the same UID. Returns 202 with
{"instance_uid": ..., "experiment_run_uid": uid} immediately, 404 if
the run is unknown.
Experiment run instances¶
GET /experiment_run_instances/{uid}
Report the status of an instance. The body is
{"uid": ..., "status": ...} and the HTTP status code mirrors the lifecycle
state:
Status |
HTTP code |
|---|---|
|
|
|
|
|
|
|
|
|
|
UNKNOWN is reported (with 404) when no instance row exists for the UID.
GET /experiment_run_instances/{uid}/logs
Return the persisted log entries for an instance, read from the separate log
store (log_store_uri). Existence of the instance is checked against the
results store: an unknown instance is a 404 (body
{"uid": ..., "status": "UNKNOWN"}); a known instance with no logs yields a
200 with an empty list. The response body is:
{"instance_uid": ..., "count": <n>, "logs": [ ... ]}
where each log entry is
{"created_at", "level", "levelno", "logger", "message"}.
The following query parameters filter and page the result:
levelMinimum log level by name (e.g.
INFO,WARNING). Entries with a numeric level at or above this level are returned. An unknown level name is a422.sinceISO-8601 timestamp; only entries at or after this time are returned. A value that is not a valid ISO-8601 timestamp is a
422.Warning
URL-encode the timestamp. A
+in a raw query string decodes to a space, so an offset like+00:00must be encoded (%2B00:00); otherwise the value fails validation and the request is rejected with422.loggerFilter by logger name (SQL
LIKEsyntax).limitMaximum number of entries to return. Default
1000; clamped to a maximum of10000.offsetNumber of entries to skip (for paging). Default
0.
Entries are ordered by created_at ascending, then by insertion order.
Process Model¶
A single palaestrai serve invocation runs two processes:
API process (parent). Runs uvicorn together with a FastAPI application. It owns the HTTP socket(s) and the user-facing signal handlers. All read routes (
GET) and all create/delete routes (PUT/DELETE/POST) talk to the results store database directly – these are pure store operations that do not need a running executor.Executor process (child). Runs
Executor(is_service=True).execute()in its own event loop. In service mode the executor does not shut down when no run is scheduled; it idles instead, waiting for work. It owns the experiment-run-instance status lifecycle (SCHEDULED→RUNNING→FINISHED/ERROR), which it writes authoritatively to the database.
The parent supervises the child over a small command channel (a
multiprocessing pipe). Scheduling a new instance
(PUT /experiment_runs/{uid}/instances) sends a SCHEDULE command to the
child; everything else is served from the database. If the executor child dies
unexpectedly, the parent marks any still-RUNNING instance as ERROR
(crash safety), cleans up orphaned processes, and respawns the child.