Quickstart Guide¶

What is better than getting into learning agents by letting them play Tic-Tac-Toe? After all, its a popcultural classic!

Setting up a Game of Tic-Tac-Toe¶

In palaestrAI, everything is controlled through an experiment run file, so we first need to set up one. Experiment run files define agents, environment, when episodes terminate, and so on. You can read about this further in the documentation; for now, let’s just accept the following YAML code as-is:

[1]:

ttt_run = """
uid: A Training Match of Tic-Tac-Toe
experiment_uid: Tic-Tac-Toe
seed: 234247
version: "3.5"
schedule:
  - Training:
      environments:
        - environment:
            name: palaestrai_environments.tictactoe:TicTacToeEnvironment
            uid: tttenv
            params:
              twoplayer: true
      agents:
        - &player
          name: Player 1
          brain:
            name: harl:PPOBrain
            params:
              fc_dims: [2, 1]
          muscle:
            name: harl:PPOMuscle
            params: {}
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {}
          sensors:
            - tttenv.Tile 1-1
            - tttenv.Tile 1-2
            - tttenv.Tile 1-3
            - tttenv.Tile 2-1
            - tttenv.Tile 2-2
            - tttenv.Tile 2-3
            - tttenv.Tile 3-1
            - tttenv.Tile 3-2
            - tttenv.Tile 3-3
          actuators:
            - tttenv.Field selector
        - <<: *player
          name: Player 2
      simulation:
        name: palaestrai.simulation:TakingTurns
        conditions:
          - name: palaestrai.experiment:AgentObjectiveTerminationCondition
            params:
              "Player 1":
                brain_avg1: 10.0
              "Player 2":
                brain_avg1: 10.0
      phase_config:
        mode: train
        worker: 3
run_config:  # Not a runTIME config
  condition:
    name: palaestrai.experiment:AgentObjectiveTerminationCondition
    params:
      "Player 1":
        phase_avg10: 5.0
      "Player 2":
        phase_avg10: 5.0
"""

Creating a Results Storage Database¶

Later on see the results, we need to tell palaestrAI where it can store all results data. A convenient option is to use a local SQLite database, which we will enable and create here.

palaestrAI will probably tell you that using SQLite is not ideal in terms of performance. For local experiments, this doesn’t concern us, though.

[2]:

import palaestrai
rconf = palaestrai.core.RuntimeConfig()
rconf.reset()
rconf.load({"store_uri": "sqlite:///palaestrai.db"})
palaestrai.store.setup_database()

Importing from 'midas.tools.palaestrai' is deprecated! Use 'midas_palaestrai' instead!
Importing from 'midas.tools.palaestrai' is deprecated! Use 'midas_palaestrai' instead!
Could not create extension timescaledb and create hypertables: (sqlite3.OperationalError) near "EXTENSION": syntax error
[SQL: CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;]
(Background on this error at: https://sqlalche.me/e/14/e3q8). Your database setup might lead to noticeable slowdowns with larger experiment runs. Please upgrade to PostgreSQL with TimescaleDB for the best performance.

Learning the Game¶

Now we can execute the training run defined above. The palaestrai.execute() command accepts a list of strings as well as an io object. If strings are given, it is assumed they are paths to YAML files; io objects are considered to provide access to the contests directly. Therefore, we use io.StringIO to access our YAML document defined above.

palaestrai.execute() will now commence the training. It might take a while to run; after all, we want to traing until one of the agent avieves a consistently high average reward.

[3]:

import io
palaestrai.execute(io.StringIO(ttt_run))

Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7fd0a19a9610>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7fd09ebb4dc0>, 'ppo_actor', None, None, 2)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7f7ff7e79670>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7f7ff5088e80>, 'ppo_actor', None, None, 1)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7fd0a19a9610>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7fd09ebb56c0>, 'ppo_critic', None, None, 2)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7f7ff7e79670>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7f7ff5089240>, 'ppo_critic', None, None, 1)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7fd0a19a9610>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7fd09ebb59c0>, 'ppo_actor', None, None, 2)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7f7ff7e79670>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7f7ff5089240>, 'ppo_actor', None, None, 1)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7fd0a19a9610>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7fd09ebb5a80>, 'ppo_critic', None, None, 2)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7f7ff7e79670>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7f7ff5089a80>, 'ppo_critic', None, None, 1)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7fd0a19a9610>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7fd09ebb4dc0>, 'ppo_actor', None, None, 2)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7f7ff7e79670>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7f7ff5089cc0>, 'ppo_actor', None, None, 1)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7fd0a19a9610>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7fd09ebb5d80>, 'ppo_critic', None, None, 2)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7f7ff7e79670>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7f7ff5089240>, 'ppo_critic', None, None, 1)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)

[3]:

(['A Training Match of Tic-Tac-Toe'], <ExecutorState.EXITED: 4>)

Accessing Results¶

palaestrAI offers a small convenience interface to access the data most people will want to see. The functions are part of he palaestrai.store.query package:

[4]:

import palaestrai.store.query as palq

All these functions return pandas or dask dataframes, which makes them also useful in Jupyter notebooks. Let’s first see what experiments our database has logged so far (there should be only one):

[5]:

exps = palq.experiments_and_runs_configurations()
exps

[5]:

	experiment_id	experiment_name	experiment_document	experiment_run_id	experiment_run_uid	experiment_run_document	experiment_run_instance_id	experiment_run_instance_uid	experiment_run_phase_id	experiment_run_phase_uid	experiment_run_phase_mode
0	1	Tic-Tac-Toe	None	1	A Training Match of Tic-Tac-Toe	{'uid': 'A Training Match of Tic-Tac-Toe', 'ex...	1	d64ed154-ce8f-46fa-8563-a226f4d2ff94	1	Training	train

Our two agents have competed for a number of episodes. Let’s see their cumulative reward. For this, we have a query function called muscles_cumulative_objective(). Two things are of note here.

First, palaestrAI names its agents “Muscles.” This naming is in contrast to the learning part, which is named “Brain.” You’re probably asking yourself now “why the funny names?” The reason lies in palaestrAI’s architecture, which tries to hide as much of the technical details of the actual execution as possible; so an agent’s “Brain” and its “Muscles” are metaphores for the pure algorithm implementations.

Second, you might notice the function parameter like_dataframe. Almost every query function has this. It allows you to pass a dataframe for filtering; palaestrAI then constructs the underlying SQL query according to the dataframe’s columns. We’re demonstrating one possible convenient application of this here. First, we got the list of all experiments, runs, and phases in the previous cell. Now, we use pandas’ filtering to reduce the rows to those experiment runs we’re interested in. This reduced dataframe can the be passed to our next query function: This way, we retrieve only the cumulative values for the phases we really want to analyize.

[6]:

cumobj = palq.muscles_cumulative_objective(like_dataframe=exps.iloc[[-1]])
cumobj[
    ["agent_name", "muscle_actions_episode", "muscle_cumulative_objective"]
].set_index(
    "muscle_actions_episode"
).pivot_table(
    columns=["agent_name"],
    values=["muscle_cumulative_objective"],
    index=["muscle_actions_episode"]
).plot()

[6]:

<Axes: xlabel='muscle_actions_episode'>

The Real Match¶

Training time is over, let’s have a real competition! We will now schedule a separate run, but no longer as training episode, but for testing. Our next experiment run definition instructs palaestrAI to load the previously trained agents. To make things interesting, we let the better player compete against itself:

[7]:

best_player = cumobj[
    ["agent_name", "muscle_cumulative_objective"]
].groupby(
    by=["agent_name"]
).sum().sort_values(
    by=["muscle_cumulative_objective"]
).index[-1]

[8]:

ttt_test = """
uid: Game of Tic-Tac-Toe
experiment_uid: Tic-Tac-Toe
seed: 234247
version: "3.5"
schedule:
  - Test:
      environments:
        - environment:
            name: palaestrai_environments.tictactoe:TicTacToeEnvironment
            uid: tttenv
            params:
              twoplayer: true
              invalid_turn_limit: -1
      agents:
        - &player
          name: Player 1
          brain:
            name: harl:PPOBrain
            params:
              fc_dims: [2, 1]
          muscle:
            name: harl:PPOMuscle
            params: {}
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {}
          load:
            experiment_run: A Training Match of Tic-Tac-Toe
            agent: %(best_player)s
            phase: 0
          sensors:
            - tttenv.Tile 1-1
            - tttenv.Tile 1-2
            - tttenv.Tile 1-3
            - tttenv.Tile 2-1
            - tttenv.Tile 2-2
            - tttenv.Tile 2-3
            - tttenv.Tile 3-1
            - tttenv.Tile 3-2
            - tttenv.Tile 3-3
          actuators:
            - tttenv.Field selector
        - <<: *player
          name: Player 2
          load:
            experiment_run: A Training Match of Tic-Tac-Toe
            agent: %(best_player)s
            phase: 0
      simulation:
        name: palaestrai.simulation:TakingTurns
        conditions:
          - name: palaestrai.experiment:EnvironmentTerminationCondition
            params: {}
      phase_config:
        mode: test
        worker: 1
        episodes: 1
run_config:  # Not a runTIME config
  condition:
    name: palaestrai.experiment:VanillaRunGovernorTerminationCondition
    params: {}

""" % {"best_player": best_player}

[9]:

palaestrai.execute(io.StringIO(ttt_test))

[9]:

(['Game of Tic-Tac-Toe'], <ExecutorState.EXITED: 4>)

After we’re done, let’s examine the list of experiment runs again. We can now filter for test runs:

[10]:

exps = palq.experiments_and_runs_configurations()
exps[(exps.experiment_name == "Tic-Tac-Toe") & (exps.experiment_run_phase_mode == "test")]

[10]:

	experiment_id	experiment_name	experiment_document	experiment_run_id	experiment_run_uid	experiment_run_document	experiment_run_instance_id	experiment_run_instance_uid	experiment_run_phase_id	experiment_run_phase_uid	experiment_run_phase_mode
1	1	Tic-Tac-Toe	None	2	Game of Tic-Tac-Toe	{'uid': 'Game of Tic-Tac-Toe', 'experiment_uid...	2	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Test	test

This time, we’re not after cumulative scores as we’ve instructed palaestrAI to play one match only. Let’s get the individual moves of each player. We’re using the muscle_actions() query function now. Like any other query function, we can pass it a dataframe with values we want to query (via the like_dataframe parameter). In this case, we’re interested only in the actions of the test run, so let’s use pandas’ filtering capabilities:

[11]:

ma = palq.muscle_actions(
    like_dataframe=exps[(exps.experiment_name == "Tic-Tac-Toe") & (exps.experiment_run_phase_mode == "test")].iloc[[-1]])
ma

[11]:

	muscle_action_walltime	muscle_action_simtimes	rollout_worker_uid	muscle_sensor_readings	muscle_actuator_setpoints	muscle_action_rewards	muscle_action_objective	muscle_action_done	agent_id	agent_uid	agent_name	experiment_run_phase_id	experiment_run_phase_uid	experiment_run_phase_configuration	experiment_run_instance_uid	experiment_run_id	experiment_run_uid	experiment_id	experiment_name
muscle_action_id
1052	2025-04-26 21:39:51.236292	{'tttenv': {'simtime_ticks': 0, 'simtime_times...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=0, space=Discrete(3),...	[ActuatorInformation(value=2, space=Discrete(9...	[RewardInformation(value=[1.], space=Box(low=[...	1.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1053	2025-04-26 21:39:51.313104	{'tttenv': {'simtime_ticks': 1, 'simtime_times...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=0, space=Discrete(3),...	[ActuatorInformation(value=0, space=Discrete(9...	[RewardInformation(value=[1.], space=Box(low=[...	1.0	False	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1054	2025-04-26 21:39:51.377685	{'tttenv': {'simtime_ticks': 2, 'simtime_times...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=6, space=Discrete(9...	[RewardInformation(value=[1.], space=Box(low=[...	1.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1055	2025-04-26 21:39:51.431586	{'tttenv': {'simtime_ticks': 3, 'simtime_times...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=1, space=Discrete(9...	[RewardInformation(value=[1.], space=Box(low=[...	1.0	False	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1056	2025-04-26 21:39:51.478556	{'tttenv': {'simtime_ticks': 4, 'simtime_times...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=8, space=Discrete(9...	[RewardInformation(value=[1.], space=Box(low=[...	1.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1057	2025-04-26 21:39:51.572816	{'tttenv': {'simtime_ticks': 5, 'simtime_times...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=7, space=Discrete(9...	[RewardInformation(value=[1.], space=Box(low=[...	1.0	False	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1058	2025-04-26 21:39:51.635500	{'tttenv': {'simtime_ticks': 6, 'simtime_times...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=6, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1059	2025-04-26 21:39:51.682877	{'tttenv': {'simtime_ticks': 7, 'simtime_times...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=0, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1060	2025-04-26 21:39:51.726250	{'tttenv': {'simtime_ticks': 8, 'simtime_times...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=1, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1061	2025-04-26 21:39:51.763554	{'tttenv': {'simtime_ticks': 9, 'simtime_times...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=5, space=Discrete(9...	[RewardInformation(value=[1.], space=Box(low=[...	1.0	False	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1062	2025-04-26 21:39:51.803152	{'tttenv': {'simtime_ticks': 10, 'simtime_time...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=2, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1063	2025-04-26 21:39:51.843948	{'tttenv': {'simtime_ticks': 11, 'simtime_time...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=7, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1064	2025-04-26 21:39:51.881714	{'tttenv': {'simtime_ticks': 12, 'simtime_time...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=2, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1065	2025-04-26 21:39:51.913860	{'tttenv': {'simtime_ticks': 13, 'simtime_time...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=0, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1066	2025-04-26 21:39:51.948376	{'tttenv': {'simtime_ticks': 14, 'simtime_time...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=2, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1067	2025-04-26 21:39:51.980944	{'tttenv': {'simtime_ticks': 15, 'simtime_time...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=2, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1068	2025-04-26 21:39:52.045097	{'tttenv': {'simtime_ticks': 16, 'simtime_time...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=5, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1069	2025-04-26 21:39:52.078511	{'tttenv': {'simtime_ticks': 17, 'simtime_time...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=2, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1070	2025-04-26 21:39:52.228608	{'tttenv': {'simtime_ticks': 18, 'simtime_time...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=6, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1071	2025-04-26 21:39:52.260869	{'tttenv': {'simtime_ticks': 19, 'simtime_time...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=1, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1072	2025-04-26 21:39:52.295799	{'tttenv': {'simtime_ticks': 20, 'simtime_time...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=0, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1073	2025-04-26 21:39:52.333874	{'tttenv': {'simtime_ticks': 21, 'simtime_time...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=6, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1074	2025-04-26 21:39:52.442603	{'tttenv': {'simtime_ticks': 22, 'simtime_time...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=1, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1075	2025-04-26 21:39:52.482419	{'tttenv': {'simtime_ticks': 23, 'simtime_time...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=5, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1076	2025-04-26 21:39:52.536087	{'tttenv': {'simtime_ticks': 24, 'simtime_time...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=8, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1077	2025-04-26 21:39:52.587534	{'tttenv': {'simtime_ticks': 25, 'simtime_time...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=2, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1078	2025-04-26 21:39:52.618585	{'tttenv': {'simtime_ticks': 26, 'simtime_time...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=8, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1079	2025-04-26 21:39:52.660326	{'tttenv': {'simtime_ticks': 27, 'simtime_time...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=0, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1080	2025-04-26 21:39:52.692892	{'tttenv': {'simtime_ticks': 28, 'simtime_time...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=5, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1081	2025-04-26 21:39:52.723946	{'tttenv': {'simtime_ticks': 29, 'simtime_time...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=0, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1082	2025-04-26 21:39:52.754750	{'tttenv': {'simtime_ticks': 30, 'simtime_time...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=5, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	False	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1083	2025-04-26 21:39:52.785736	{'tttenv': {'simtime_ticks': 32, 'simtime_time...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=4, space=Discrete(9...	[RewardInformation(value=[10.], space=Box(low=...	10.0	True	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1084	2025-04-26 21:39:52.789901	{'tttenv': {'simtime_ticks': 31, 'simtime_time...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=6, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	True	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1085	2025-04-26 21:39:52.827342	{'tttenv': {'simtime_ticks': 33, 'simtime_time...	AgentConductor-d4d27e.Player 1-fbba4f	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=0, space=Discrete(9...	[RewardInformation(value=[10.], space=Box(low=...	10.0	True	3	AgentConductor-d4d27e	Player 1	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe
1086	2025-04-26 21:39:52.828217	{'tttenv': {'simtime_ticks': 33, 'simtime_time...	AgentConductor-251a3b.Player 2-a9c664	[SensorInformation(value=1, space=Discrete(3),...	[ActuatorInformation(value=1, space=Discrete(9...	[RewardInformation(value=[-100.], space=Box(lo...	-100.0	True	4	AgentConductor-251a3b	Player 2	2	Test	{'mode': 'test', 'worker': 1, 'episodes': 1}	d11b4a34-6507-4e4f-94fb-dcca264a7976	2	Game of Tic-Tac-Toe	1	Tic-Tac-Toe

With the help of matplotlib, we can even visualize what the agents have done. The following function turns the state of the board into a matplotlib plot.

Note that the players still occasionally make invalid moves; for this quickstart tutorial, the number of training episodes is simply not big enough. So we simply filter those moves and present the condensed version, but the effect is that sometimes it appears is if a player was skipped: It wasn’t, it simply tried to place its mark at a position where another mark already existed.

[12]:

import matplotlib.pyplot as plt

def render_tic_tac_toe(board):
    fig, ax = plt.subplots(figsize=(4,4))
    # Draw grid lines
    for i in range(1, 3):
        ax.plot([i, i], [0, 3], color='black', linewidth=2)
        ax.plot([0, 3], [i, i], color='black', linewidth=2)
    # Place X and O
    for idx, val in enumerate(board):
        row, col = divmod(idx, 3)
        x = col + 0.5
        y = 2.5 - row
        if val == 1:
            ax.text(x, y, 'O', fontsize=36, ha='center', va='center', color='blue')
        elif val == 2:
            ax.text(x, y, 'X', fontsize=36, ha='center', va='center', color='red')
    ax.set_xlim(0, 3)
    ax.set_ylim(0, 3)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_aspect('equal')
    plt.show()

Each agent’s sensor readings contain the complete board state. As the sensor readings are logged as part of the muscle_actions() query—which basically gives us trajectories—, we can iterate over all moves and render the Tic-Tac-Toe board now. Enjoy!

[13]:

board_states = list(
    ma[ma.muscle_action_objective > 0]
    .muscle_sensor_readings
    .apply(lambda x: [s.value for s in x])
)
for state in board_states:
    render_tic_tac_toe(state)