palaestrai.environment package#

palaestrai.environment.Environment: Environment Base Class#

class palaestrai.environment.Environment(uid: str, broker_uri: str, seed: int)[source]#

Bases: ABC

Abstract class for environment implementation

This abstract calls provides all necessary functions needed to implement a new environment. The developer only has to implement the functions start_environment and update.

Parameters:
  • uid (str) – Unique identifier to identify an environment

  • broker_uri (str) – URI used to connect to the simulation broker

  • seed (int) – Seed for recreation

reward#

If present, this method calculates the reward of the environment ( (“external reward”). See ::EnvironmentState.world_state.

Type:

::Reward

_sensor_ids#

A list of sensor IDs including the UID of the environment.

Type:

List[str]

_actuator_ids#

A list of actuator IDs including the UID of the environment.

Type:

List[str]

reset(request: EnvironmentResetRequest) EnvironmentResetResponse[source]#

Resets the environment in-place.

The default behavior for a reset comprises:

  • calling shutdown to allow a graceful shutdown of environment simulation processes

  • calling start_environment() again

  • preparing the EnvironmentResetResponse

If an environment requires a more special reset procedure, this method can be overwritten.

Parameters:

request (EnvironmentResetRequest) – The reset request send by the simulation controller.

Returns:

The response for the simulation controller.

Return type:

EnvironmentResetResponse

async run()[source]#

Main execution loop for an environment object

This method takes care of the actual execution. As long as this method does not return, the environment is still active.

The method receives and processes incoming messages. It applies changes to itself, i.e., setpoints delivered via EnvironmentUpdateRequest objects. It subsequently takes care of sending the appropriate update responses ( EnvironmentUpdateResponse()).

This method also interpretes EnvironmentSetupRequest and EnvironmentShutdownRequest messages.

shutdown(reset: bool = False) bool[source]#

Initiate the environment shutdown.

In this function the is_terminal is set to True, which leads to a break of the main loop in the run() method.

Parameters:

reset (bool, optional) – Is set to True when only a reset is required. A concrete environment may distinguish between reset and shutdown.

Returns:

True if the shutdown was successful, False otherwise.

Return type:

bool

abstract start_environment() EnvironmentBaseline | Tuple[List[SensorInformation], List[ActuatorInformation]][source]#

Launches execution of an environment.

If the environment uses a simulation tool, this function can be used to initiate the simulation tool. In addion, this function is used to prepare the environment for the simulation. It must be able to provide initial sensor information.

On a reset, this method is called to restart a new environment run. Therefore, it also must provide initial values for all variables used!

Returns:

  • Union[EnvironmentBaseline,

  • typing.Tuple[List[SensorInformation], List[ActuatorInformation]]] – An EnvironmentBaseline object containing all initial data from the environment. For backwards compatibility, it is also possible (though deprecated) to return a tuple containing a list of available sensors and a list of available actuators.

property uid: str#

The unique identifier of the Environment object

abstract update(actuators: List[ActuatorInformation]) EnvironmentState | Tuple[List[SensorInformation], List[RewardInformation], bool][source]#

Function to update the environment

This function receives the agent’s actions and has to respond with new sensor information. This function should create a new simulation step.

Parameters:

actuators (List[ActuatorInformation]) – List of actuators with values

Returns:

  • Union[EnvironmentState,

  • typing.Tuple[List[SensorInformation], List[RewardInformation], bool]] – An EnvironmentState object; for backwards compatibility, environments can return a tuple containing a list of sensor readings (SensorInformation), a list of rewards (RewardInformation), and a flag whether the environment has terminated. Returning a tuple is considered deprecated.

property worker#

Return the major domo worker.

The worker will be created if necessary.

palaestrai.environment.EnvironmentBaseline: Inital Environment Data#

class palaestrai.environment.EnvironmentBaseline(sensors_available: List[SensorInformation], actuators_available: List[ActuatorInformation], simtime: SimTime = <factory>)[source]#

Bases: object

An Environment’s baseline after initializing

This data class contains data about an environment after it has been started, but no actor has acted yet. It contains the sensors/actuator available, initial values for sensors, as well as the starting time in the environment.

sensors_available#

Sensors available in the environment, along with initial readings

Type:

List[SensorInformation]

actuators_available#

Actuators available

Type:

List[ActuatorInformation]

simtime#

Environment starting time

Type:

palaestrai.types.SimTime (default: SimTime(simtime_ticks=1))

palaestrai.environment.EnvironmentState: Current State of an Environment#

class palaestrai.environment.EnvironmentState(sensor_information: List[SensorInformation], rewards: List[RewardInformation], done: bool, world_state: Any = None, simtime: SimTime | None = None)[source]#

Bases: object

Describes the current state of an Environment.

This dataclass is used as return value of the update() method. It contains current sensor readings, reward of the environment, indicates whether the environment has terminated or not, and finally gives time information.

sensor_information#

List of current sensor values after evaluating the environment

Type:

List[SensorInformation]

rewards#

Current rewards given from the environment

Type:

List[RewardInformation]

done#

Whether the environment has terminated (True) or not (False)

Type:

bool

world_state#

Current state of the world (whatever the environment thinks it is)

Type:

Any (default: None)

simtime#

Environment starting time

Type:

SimTime (default: None)

palaestrai.environment.DummyEnvironment: Minimal Working Dummy Environment#

class palaestrai.environment.DummyEnvironment(uid: str, broker_uri: str, seed: int, discrete: bool = True)[source]#

Bases: Environment

This class provides a dummy environment with a fixed number of sensors. The environment terminates after a fixed number of updates.

Parameters:
  • connection (broker_connection) – the URI which is used to connect to the simulation broker. It is used to communicate with the simulation controller.

  • uid (uuid4) – a universal id for the environment

  • seed (int) – Seed for recreation

  • discrete (bool, optional) – If set to True, the environment will only use discrete spaces. Otherwise, the spaces are continuous. Default is True.

start_environment()[source]#

This method is called when an EnvironmentStartRequest message is received. This dummy environment is represented by 10 sensors and 10 actuators. The sensors are of the type SensorInformation and have a random value of either 0 or 1, an observation_space between 0 and 1 and an integer number as id. The actuators are of the type ActuatorInformation and contain a value of Discrete(1), a space of None and an integer number as id.

Returns:

A list containing the SensorInformation for each of the 10 sensors and a list containing the ActuatorInformation for each of the 10 actuators.

Return type:

tuple

update(actuators)[source]#

This method is called when an EnvironmentUpdateRequest message is received. While values of the actuators manipulate an actual environment, in here those values have no impact on the behavior of the dummy environment. The state of this dummy environment is represented via random values of the SensorInformation from the 10 sensors. In this dummy environment the reward for the state is a random value of either 0 or 1. The method returns a list of SensorInformation, the random reward and the boolean is_terminal. After 10 updates the is_terminal value is set to True which triggers the respective shutdown messages.

Parameters:

actuators (list[ActuatorInformation]) – A list of ActuatorInformation to interact with the environment.

Returns:

A list of SensorInformation representing the 10 sensors, the reward and boolean for is_terminal.

Return type:

tuple

palaestrai.environment.EnvironmentConductor: Environment Lifecycle Management#

class palaestrai.environment.EnvironmentConductor(env_cfg, seed: int, uid=None)[source]#

Bases: object

The environment conductor creates new environment instances.

There could be multiple simulation runs and each would need a separate environment. The environment conductor controls the creation of those new environment instances.

Parameters:
  • env_cfg (dict) – Dictionary with parameters needed by the environment

  • seed (uuid4) – Random seed for recreation

  • uid (uuid4) – Unique identifier

async run()#

Main event/state loop of the ESM

This run method is injected into monitored classes if they do not have one already. The structure of run is as follows:

  1. It resets the handlers for SIGCHLD, SIGINT, and SIGTERM to the OS’ default.

  2. It calls monitored.setup(), if it exists.

  3. It creates an ESM instance for the monitored object and adds signal handlers for SIGCHLD, SIGINT, and SIGTERM according to what the monitored class defines (via @ESM.on(signal.SIGINT), etc.)

  4. It transides to the first state, defined by @ESM.enter. It then waits for state changes/events until monitored.stop() is called.

  5. Finally, once the main event/state loop concludes, monitored.teardown() is called (if present).

stop()#

Stops the ESM.

Stopping the ESM also means shutting down all running processes and cancelling all outstanding tasks (e.g., request monitors).