Quickstart Guide

What is better than getting into learning agents by letting them play Tic-Tac-Toe? After all, its a popcultural classic!

Setting up a Game of Tic-Tac-Toe

In palaestrAI, everything is controlled through an experiment run file, so we first need to set up one. Experiment run files define agents, environment, when episodes terminate, and so on. You can read about this further in the documentation; for now, let’s just accept the following YAML code as-is:

[1]:
ttt_run = """
uid: A Training Match of Tic-Tac-Toe
experiment_uid: Tic-Tac-Toe
seed: 234247
version: "3.5"
schedule:
  - Training:
      environments:
        - environment:
            name: palaestrai_environments.tictactoe:TicTacToeEnvironment
            uid: tttenv
            params:
              twoplayer: true
      agents:
        - &player
          name: Player 1
          brain:
            name: harl:PPOBrain
            params:
              fc_dims: [2, 1]
          muscle:
            name: harl:PPOMuscle
            params: {}
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {}
          sensors:
            - tttenv.Tile 1-1
            - tttenv.Tile 1-2
            - tttenv.Tile 1-3
            - tttenv.Tile 2-1
            - tttenv.Tile 2-2
            - tttenv.Tile 2-3
            - tttenv.Tile 3-1
            - tttenv.Tile 3-2
            - tttenv.Tile 3-3
          actuators:
            - tttenv.Field selector
        - <<: *player
          name: Player 2
      simulation:
        name: palaestrai.simulation:TakingTurns
        conditions:
          - name: palaestrai.experiment:AgentObjectiveTerminationCondition
            params:
              "Player 1":
                brain_avg1: 10.0
              "Player 2":
                brain_avg1: 10.0
      phase_config:
        mode: train
        worker: 3
run_config:  # Not a runTIME config
  condition:
    name: palaestrai.experiment:AgentObjectiveTerminationCondition
    params:
      "Player 1":
        phase_avg10: 5.0
      "Player 2":
        phase_avg10: 5.0
"""

Creating a Results Storage Database

Later on see the results, we need to tell palaestrAI where it can store all results data. A convenient option is to use a local SQLite database, which we will enable and create here.

palaestrAI will probably tell you that using SQLite is not ideal in terms of performance. For local experiments, this doesn’t concern us, though.

[2]:
import palaestrai
rconf = palaestrai.core.RuntimeConfig()
rconf.reset()
rconf.load({"store_uri": "sqlite:///palaestrai.db"})
palaestrai.store.setup_database()
Importing from 'midas.tools.palaestrai' is deprecated! Use 'midas_palaestrai' instead!
Importing from 'midas.tools.palaestrai' is deprecated! Use 'midas_palaestrai' instead!
Could not create extension timescaledb and create hypertables: (sqlite3.OperationalError) near "EXTENSION": syntax error
[SQL: CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;]
(Background on this error at: https://sqlalche.me/e/14/e3q8). Your database setup might lead to noticeable slowdowns with larger experiment runs. Please upgrade to PostgreSQL with TimescaleDB for the best performance.

Learning the Game

Now we can execute the training run defined above. The palaestrai.execute() command accepts a list of strings as well as an io object. If strings are given, it is assumed they are paths to YAML files; io objects are considered to provide access to the contests directly. Therefore, we use io.StringIO to access our YAML document defined above.

palaestrai.execute() will now commence the training. It might take a while to run; after all, we want to traing until one of the agent avieves a consistently high average reward.

[3]:
import io
palaestrai.execute(io.StringIO(ttt_run))
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7fd0a19a9610>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7fd09ebb4dc0>, 'ppo_actor', None, None, 2)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7f7ff7e79670>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7f7ff5088e80>, 'ppo_actor', None, None, 1)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7fd0a19a9610>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7fd09ebb56c0>, 'ppo_critic', None, None, 2)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7f7ff7e79670>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7f7ff5089240>, 'ppo_critic', None, None, 1)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7fd0a19a9610>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7fd09ebb59c0>, 'ppo_actor', None, None, 2)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7f7ff7e79670>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7f7ff5089240>, 'ppo_actor', None, None, 1)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7fd0a19a9610>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7fd09ebb5a80>, 'ppo_critic', None, None, 2)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7f7ff7e79670>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7f7ff5089a80>, 'ppo_critic', None, None, 1)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7fd0a19a9610>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7fd09ebb4dc0>, 'ppo_actor', None, None, 2)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7f7ff7e79670>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7f7ff5089cc0>, 'ppo_actor', None, None, 1)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7fd0a19a9610>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7fd09ebb5d80>, 'ppo_critic', None, None, 2)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Could not dump to <palaestrai.agent.store_brain_dumper.StoreBrainDumper object at 0x7f7ff7e79670>: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO brain_states (walltime, state, tag, simtime_ticks, simtime_timestamp, agent_id) VALUES (CURRENT_TIMESTAMP, ?, ?, ?, ?, ?)]
[parameters: (<memory at 0x7f7ff5089240>, 'ppo_critic', None, None, 1)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
[3]:
(['A Training Match of Tic-Tac-Toe'], <ExecutorState.EXITED: 4>)

Accessing Results

palaestrAI offers a small convenience interface to access the data most people will want to see. The functions are part of he palaestrai.store.query package:

[4]:
import palaestrai.store.query as palq

All these functions return pandas or dask dataframes, which makes them also useful in Jupyter notebooks. Let’s first see what experiments our database has logged so far (there should be only one):

[5]:
exps = palq.experiments_and_runs_configurations()
exps
[5]:
experiment_id experiment_name experiment_document experiment_run_id experiment_run_uid experiment_run_document experiment_run_instance_id experiment_run_instance_uid experiment_run_phase_id experiment_run_phase_uid experiment_run_phase_mode
0 1 Tic-Tac-Toe None 1 A Training Match of Tic-Tac-Toe {'uid': 'A Training Match of Tic-Tac-Toe', 'ex... 1 d64ed154-ce8f-46fa-8563-a226f4d2ff94 1 Training train

Our two agents have competed for a number of episodes. Let’s see their cumulative reward. For this, we have a query function called muscles_cumulative_objective(). Two things are of note here.

First, palaestrAI names its agents “Muscles.” This naming is in contrast to the learning part, which is named “Brain.” You’re probably asking yourself now “why the funny names?” The reason lies in palaestrAI’s architecture, which tries to hide as much of the technical details of the actual execution as possible; so an agent’s “Brain” and its “Muscles” are metaphores for the pure algorithm implementations.

Second, you might notice the function parameter like_dataframe. Almost every query function has this. It allows you to pass a dataframe for filtering; palaestrAI then constructs the underlying SQL query according to the dataframe’s columns. We’re demonstrating one possible convenient application of this here. First, we got the list of all experiments, runs, and phases in the previous cell. Now, we use pandas’ filtering to reduce the rows to those experiment runs we’re interested in. This reduced dataframe can the be passed to our next query function: This way, we retrieve only the cumulative values for the phases we really want to analyize.

[6]:
cumobj = palq.muscles_cumulative_objective(like_dataframe=exps.iloc[[-1]])
cumobj[
    ["agent_name", "muscle_actions_episode", "muscle_cumulative_objective"]
].set_index(
    "muscle_actions_episode"
).pivot_table(
    columns=["agent_name"],
    values=["muscle_cumulative_objective"],
    index=["muscle_actions_episode"]
).plot()
[6]:
<Axes: xlabel='muscle_actions_episode'>
_images/quickstart_11_1.png

The Real Match

Training time is over, let’s have a real competition! We will now schedule a separate run, but no longer as training episode, but for testing. Our next experiment run definition instructs palaestrAI to load the previously trained agents. To make things interesting, we let the better player compete against itself:

[7]:
best_player = cumobj[
    ["agent_name", "muscle_cumulative_objective"]
].groupby(
    by=["agent_name"]
).sum().sort_values(
    by=["muscle_cumulative_objective"]
).index[-1]
[8]:
ttt_test = """
uid: Game of Tic-Tac-Toe
experiment_uid: Tic-Tac-Toe
seed: 234247
version: "3.5"
schedule:
  - Test:
      environments:
        - environment:
            name: palaestrai_environments.tictactoe:TicTacToeEnvironment
            uid: tttenv
            params:
              twoplayer: true
              invalid_turn_limit: -1
      agents:
        - &player
          name: Player 1
          brain:
            name: harl:PPOBrain
            params:
              fc_dims: [2, 1]
          muscle:
            name: harl:PPOMuscle
            params: {}
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {}
          load:
            experiment_run: A Training Match of Tic-Tac-Toe
            agent: %(best_player)s
            phase: 0
          sensors:
            - tttenv.Tile 1-1
            - tttenv.Tile 1-2
            - tttenv.Tile 1-3
            - tttenv.Tile 2-1
            - tttenv.Tile 2-2
            - tttenv.Tile 2-3
            - tttenv.Tile 3-1
            - tttenv.Tile 3-2
            - tttenv.Tile 3-3
          actuators:
            - tttenv.Field selector
        - <<: *player
          name: Player 2
          load:
            experiment_run: A Training Match of Tic-Tac-Toe
            agent: %(best_player)s
            phase: 0
      simulation:
        name: palaestrai.simulation:TakingTurns
        conditions:
          - name: palaestrai.experiment:EnvironmentTerminationCondition
            params: {}
      phase_config:
        mode: test
        worker: 1
        episodes: 1
run_config:  # Not a runTIME config
  condition:
    name: palaestrai.experiment:VanillaRunGovernorTerminationCondition
    params: {}

""" % {"best_player": best_player}
[9]:
palaestrai.execute(io.StringIO(ttt_test))
[9]:
(['Game of Tic-Tac-Toe'], <ExecutorState.EXITED: 4>)

After we’re done, let’s examine the list of experiment runs again. We can now filter for test runs:

[10]:
exps = palq.experiments_and_runs_configurations()
exps[(exps.experiment_name == "Tic-Tac-Toe") & (exps.experiment_run_phase_mode == "test")]
[10]:
experiment_id experiment_name experiment_document experiment_run_id experiment_run_uid experiment_run_document experiment_run_instance_id experiment_run_instance_uid experiment_run_phase_id experiment_run_phase_uid experiment_run_phase_mode
1 1 Tic-Tac-Toe None 2 Game of Tic-Tac-Toe {'uid': 'Game of Tic-Tac-Toe', 'experiment_uid... 2 d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Test test

This time, we’re not after cumulative scores as we’ve instructed palaestrAI to play one match only. Let’s get the individual moves of each player. We’re using the muscle_actions() query function now. Like any other query function, we can pass it a dataframe with values we want to query (via the like_dataframe parameter). In this case, we’re interested only in the actions of the test run, so let’s use pandas’ filtering capabilities:

[11]:
ma = palq.muscle_actions(
    like_dataframe=exps[(exps.experiment_name == "Tic-Tac-Toe") & (exps.experiment_run_phase_mode == "test")].iloc[[-1]])
ma
[11]:
muscle_action_walltime muscle_action_simtimes rollout_worker_uid muscle_sensor_readings muscle_actuator_setpoints muscle_action_rewards muscle_action_objective muscle_action_done agent_id agent_uid agent_name experiment_run_phase_id experiment_run_phase_uid experiment_run_phase_configuration experiment_run_instance_uid experiment_run_id experiment_run_uid experiment_id experiment_name
muscle_action_id
1052 2025-04-26 21:39:51.236292 {'tttenv': {'simtime_ticks': 0, 'simtime_times... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=0, space=Discrete(3),... [ActuatorInformation(value=2, space=Discrete(9... [RewardInformation(value=[1.], space=Box(low=[... 1.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1053 2025-04-26 21:39:51.313104 {'tttenv': {'simtime_ticks': 1, 'simtime_times... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=0, space=Discrete(3),... [ActuatorInformation(value=0, space=Discrete(9... [RewardInformation(value=[1.], space=Box(low=[... 1.0 False 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1054 2025-04-26 21:39:51.377685 {'tttenv': {'simtime_ticks': 2, 'simtime_times... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=6, space=Discrete(9... [RewardInformation(value=[1.], space=Box(low=[... 1.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1055 2025-04-26 21:39:51.431586 {'tttenv': {'simtime_ticks': 3, 'simtime_times... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=1, space=Discrete(9... [RewardInformation(value=[1.], space=Box(low=[... 1.0 False 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1056 2025-04-26 21:39:51.478556 {'tttenv': {'simtime_ticks': 4, 'simtime_times... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=8, space=Discrete(9... [RewardInformation(value=[1.], space=Box(low=[... 1.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1057 2025-04-26 21:39:51.572816 {'tttenv': {'simtime_ticks': 5, 'simtime_times... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=7, space=Discrete(9... [RewardInformation(value=[1.], space=Box(low=[... 1.0 False 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1058 2025-04-26 21:39:51.635500 {'tttenv': {'simtime_ticks': 6, 'simtime_times... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=6, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1059 2025-04-26 21:39:51.682877 {'tttenv': {'simtime_ticks': 7, 'simtime_times... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=0, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1060 2025-04-26 21:39:51.726250 {'tttenv': {'simtime_ticks': 8, 'simtime_times... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=1, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1061 2025-04-26 21:39:51.763554 {'tttenv': {'simtime_ticks': 9, 'simtime_times... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=5, space=Discrete(9... [RewardInformation(value=[1.], space=Box(low=[... 1.0 False 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1062 2025-04-26 21:39:51.803152 {'tttenv': {'simtime_ticks': 10, 'simtime_time... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=2, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1063 2025-04-26 21:39:51.843948 {'tttenv': {'simtime_ticks': 11, 'simtime_time... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=7, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1064 2025-04-26 21:39:51.881714 {'tttenv': {'simtime_ticks': 12, 'simtime_time... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=2, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1065 2025-04-26 21:39:51.913860 {'tttenv': {'simtime_ticks': 13, 'simtime_time... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=0, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1066 2025-04-26 21:39:51.948376 {'tttenv': {'simtime_ticks': 14, 'simtime_time... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=2, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1067 2025-04-26 21:39:51.980944 {'tttenv': {'simtime_ticks': 15, 'simtime_time... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=2, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1068 2025-04-26 21:39:52.045097 {'tttenv': {'simtime_ticks': 16, 'simtime_time... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=5, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1069 2025-04-26 21:39:52.078511 {'tttenv': {'simtime_ticks': 17, 'simtime_time... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=2, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1070 2025-04-26 21:39:52.228608 {'tttenv': {'simtime_ticks': 18, 'simtime_time... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=6, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1071 2025-04-26 21:39:52.260869 {'tttenv': {'simtime_ticks': 19, 'simtime_time... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=1, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1072 2025-04-26 21:39:52.295799 {'tttenv': {'simtime_ticks': 20, 'simtime_time... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=0, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1073 2025-04-26 21:39:52.333874 {'tttenv': {'simtime_ticks': 21, 'simtime_time... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=6, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1074 2025-04-26 21:39:52.442603 {'tttenv': {'simtime_ticks': 22, 'simtime_time... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=1, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1075 2025-04-26 21:39:52.482419 {'tttenv': {'simtime_ticks': 23, 'simtime_time... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=5, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1076 2025-04-26 21:39:52.536087 {'tttenv': {'simtime_ticks': 24, 'simtime_time... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=8, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1077 2025-04-26 21:39:52.587534 {'tttenv': {'simtime_ticks': 25, 'simtime_time... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=2, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1078 2025-04-26 21:39:52.618585 {'tttenv': {'simtime_ticks': 26, 'simtime_time... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=8, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1079 2025-04-26 21:39:52.660326 {'tttenv': {'simtime_ticks': 27, 'simtime_time... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=0, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1080 2025-04-26 21:39:52.692892 {'tttenv': {'simtime_ticks': 28, 'simtime_time... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=5, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1081 2025-04-26 21:39:52.723946 {'tttenv': {'simtime_ticks': 29, 'simtime_time... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=0, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1082 2025-04-26 21:39:52.754750 {'tttenv': {'simtime_ticks': 30, 'simtime_time... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=5, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 False 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1083 2025-04-26 21:39:52.785736 {'tttenv': {'simtime_ticks': 32, 'simtime_time... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=4, space=Discrete(9... [RewardInformation(value=[10.], space=Box(low=... 10.0 True 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1084 2025-04-26 21:39:52.789901 {'tttenv': {'simtime_ticks': 31, 'simtime_time... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=6, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 True 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1085 2025-04-26 21:39:52.827342 {'tttenv': {'simtime_ticks': 33, 'simtime_time... AgentConductor-d4d27e.Player 1-fbba4f [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=0, space=Discrete(9... [RewardInformation(value=[10.], space=Box(low=... 10.0 True 3 AgentConductor-d4d27e Player 1 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe
1086 2025-04-26 21:39:52.828217 {'tttenv': {'simtime_ticks': 33, 'simtime_time... AgentConductor-251a3b.Player 2-a9c664 [SensorInformation(value=1, space=Discrete(3),... [ActuatorInformation(value=1, space=Discrete(9... [RewardInformation(value=[-100.], space=Box(lo... -100.0 True 4 AgentConductor-251a3b Player 2 2 Test {'mode': 'test', 'worker': 1, 'episodes': 1} d11b4a34-6507-4e4f-94fb-dcca264a7976 2 Game of Tic-Tac-Toe 1 Tic-Tac-Toe

With the help of matplotlib, we can even visualize what the agents have done. The following function turns the state of the board into a matplotlib plot.

Note that the players still occasionally make invalid moves; for this quickstart tutorial, the number of training episodes is simply not big enough. So we simply filter those moves and present the condensed version, but the effect is that sometimes it appears is if a player was skipped: It wasn’t, it simply tried to place its mark at a position where another mark already existed.

[12]:
import matplotlib.pyplot as plt

def render_tic_tac_toe(board):
    fig, ax = plt.subplots(figsize=(4,4))
    # Draw grid lines
    for i in range(1, 3):
        ax.plot([i, i], [0, 3], color='black', linewidth=2)
        ax.plot([0, 3], [i, i], color='black', linewidth=2)
    # Place X and O
    for idx, val in enumerate(board):
        row, col = divmod(idx, 3)
        x = col + 0.5
        y = 2.5 - row
        if val == 1:
            ax.text(x, y, 'O', fontsize=36, ha='center', va='center', color='blue')
        elif val == 2:
            ax.text(x, y, 'X', fontsize=36, ha='center', va='center', color='red')
    ax.set_xlim(0, 3)
    ax.set_ylim(0, 3)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_aspect('equal')
    plt.show()

Each agent’s sensor readings contain the complete board state. As the sensor readings are logged as part of the muscle_actions() query—which basically gives us trajectories—, we can iterate over all moves and render the Tic-Tac-Toe board now. Enjoy!

[13]:
board_states = list(
    ma[ma.muscle_action_objective > 0]
    .muscle_sensor_readings
    .apply(lambda x: [s.value for s in x])
)
for state in board_states:
    render_tic_tac_toe(state)
_images/quickstart_23_0.png
_images/quickstart_23_1.png
_images/quickstart_23_2.png
_images/quickstart_23_3.png
_images/quickstart_23_4.png
_images/quickstart_23_5.png
_images/quickstart_23_6.png
_images/quickstart_23_7.png
_images/quickstart_23_8.png