palaestrai.environment package¶

palaestrai.environment.Environment: Environment Base Class¶

class palaestrai.environment.Environment(uid: str, *args, **kwargs)[source]¶

Bases: ABC

Abstract class for environment implementation

This abstract calls provides all necessary functions needed to implement a new environment. The developer only has to implement the functions start_environment and update.

Parameters:: uid (str) – Unique identifier to identify an environment

reward¶

If present, this method calculates the reward of the environment ( (“external reward”). See ::EnvironmentState.world_state.

Type:: ::Reward

_sensor_ids¶

A list of sensor IDs including the UID of the environment.

Type:: List[str]

_actuator_ids¶

A list of actuator IDs including the UID of the environment.

Type:: List[str]

property done: bool¶: Checks whether the environment has terminated

property mdp_service: str¶

property name: str¶: User-given name of this environment

async reset(request: EnvironmentResetRequest) → EnvironmentResetResponse[source]¶

Resets the environment in-place.

The default behavior for a reset comprises:

calling shutdown to allow a graceful shutdown of environment simulation processes
calling start_environment() again
preparing the EnvironmentResetResponse

If an environment requires a more special reset procedure, this method can be overwritten.

Parameters:: request (EnvironmentResetRequest) – The reset request send by the simulation controller.
Returns:: The response for the simulation controller.
Return type:: EnvironmentResetResponse

async run()¶

Main event/state loop of the ESM

This run method is injected into monitored classes if they do not have one already. The structure of run is as follows:

It resets the handlers for SIGCHLD, SIGINT, and SIGTERM to the OS’ default.
It calls monitored.setup(), if it exists.
It creates an ESM instance for the monitored object and adds signal handlers for SIGCHLD, SIGINT, and SIGTERM according to what the monitored class defines (via @ESM.on(signal.SIGINT), etc.)
It transides to the first state, defined by @ESM.enter. It then waits for state changes/events until monitored.stop() is called.
Finally, once the main event/state loop concludes, monitored.teardown() is called (if present).

setup()[source]¶

shutdown() → None[source]¶

Function to handle shutdown procedure of the environment

This function allows to perform a graceful shutdown of the current environment process. It does not prevent another call of start_environment() in case of a reset.

An implementation is optional.

abstract start_environment() → EnvironmentBaseline | Tuple[List[SensorInformation], List[ActuatorInformation]][source]¶

Launches execution of an environment.

If the environment uses a simulation tool, this function can be used to initiate the simulation tool. In addion, this function is used to prepare the environment for the simulation. It must be able to provide initial sensor information.

On a reset, this method is called to restart a new environment run. Therefore, it also must provide initial values for all variables used!

Returns:

Union[EnvironmentBaseline,
typing.Tuple[List[SensorInformation], List[ActuatorInformation]]] – An EnvironmentBaseline object containing all initial data from the environment. For backwards compatibility, it is also possible (though deprecated) to return a tuple containing a list of available sensors and a list of available actuators.

stop(error=None)¶

Stops the ESM.

Stopping the ESM also means shutting down all running processes and cancelling all outstanding tasks (e.g., request monitors).

Paramters¶

errorException: If given, the ESM will raise this after cleaning up.

property uid: str¶: The unique identifier of the Environment object

abstract update(actuators: List[ActuatorInformation]) → EnvironmentState | Tuple[List[SensorInformation], List[RewardInformation], bool][source]¶

Function to update the environment

This function receives the agent’s actions and has to respond with new sensor information. This function should create a new simulation step.

Parameters:

actuators (List[ActuatorInformation]) – List of actuators with values

Returns:

Union[EnvironmentState,
typing.Tuple[List[SensorInformation], List[RewardInformation], bool]] – An EnvironmentState object; for backwards compatibility, environments can return a tuple containing a list of sensor readings (SensorInformation), a list of rewards (RewardInformation), and a flag whether the environment has terminated. Returning a tuple is considered deprecated.

palaestrai.environment.EnvironmentBaseline: Inital Environment Data¶

class palaestrai.environment.EnvironmentBaseline(sensors_available: List[SensorInformation], actuators_available: List[ActuatorInformation], simtime: SimTime = <factory>, static_world_model: Any = None)[source]¶

Bases: object

An Environment’s baseline after initializing

This data class contains data about an environment after it has been started, but no actor has acted yet. It contains the sensors/actuator available, initial values for sensors, as well as the starting time in the environment.

sensors_available¶

Sensors available in the environment, along with initial readings

Type:: List[SensorInformation]

actuators_available¶

Actuators available

Type:: List[ActuatorInformation]

simtime¶

Environment starting time

Type:: palaestrai.types.SimTime (default: SimTime(simtime_ticks=1))

static_world_model¶

Any piece of static model the world can publish. The world model should be complete in the sense that any further update of the environment can be used to also update palaestrAI’s world view through this model. Supplying the model is optional, but helpful.

Type:: Any

palaestrai.environment.EnvironmentState: Current State of an Environment¶

class palaestrai.environment.EnvironmentState(sensor_information: List[SensorInformation], rewards: List[RewardInformation], done: bool, world_state: Any = None, simtime: SimTime | None = None)[source]¶

Bases: object

Describes the current state of an Environment.

This dataclass is used as return value of the update() method. It contains current sensor readings, reward of the environment, indicates whether the environment has terminated or not, and finally gives time information.

sensor_information¶

List of current sensor values after evaluating the environment

Type:: List[SensorInformation]

rewards¶

Current rewards given from the environment

Type:: List[RewardInformation]

done¶

Whether the environment has terminated (True) or not (False)

Type:: bool

world_state¶

Current state of the world (whatever the environment thinks it is)

Type:: Any (default: None)

simtime¶

Environment starting time

Type:: SimTime (default: None)

palaestrai.environment.DummyEnvironment: Minimal Working Dummy Environment¶

class palaestrai.environment.DummyEnvironment(uid: str, broker_uri: str, seed: int, discrete: bool = True, max_iter: int = 10)[source]¶

Bases: Environment

This class provides a dummy environment with a fixed number of sensors. The environment terminates after a fixed number of updates.

Parameters:

connection (broker_connection) – the URI which is used to connect to the simulation broker. It is used to communicate with the simulation controller.
uid (uuid4) – a universal id for the environment
seed (int) – Seed for recreation
discrete (bool, optional) – If set to True, the environment will only use discrete spaces. Otherwise, the spaces are continuous. Default is True.

start_environment()[source]¶

This method is called when an EnvironmentStartRequest message is received. This dummy environment is represented by 10 sensors and 10 actuators. The sensors are of the type SensorInformation and have a random value of either 0 or 1, an observation_space between 0 and 1 and an integer number as id. The actuators are of the type ActuatorInformation and contain a value of Discrete(1), a space of None and an integer number as id.

Returns:: A list containing the SensorInformation for each of the 10 sensors and a list containing the ActuatorInformation for each of the 10 actuators.
Return type:: tuple

update(actuators)[source]¶

This method is called when an EnvironmentUpdateRequest message is received. While values of the actuators manipulate an actual environment, in here those values have no impact on the behavior of the dummy environment. The state of this dummy environment is represented via random values of the SensorInformation from the 10 sensors. In this dummy environment the reward for the state is a random value of either 0 or 1. The method returns a list of SensorInformation, the random reward and the boolean is_terminal. After 10 updates the is_terminal value is set to True which triggers the respective shutdown messages.

Parameters:: actuators (list[ActuatorInformation]) – A list of ActuatorInformation to interact with the environment.
Returns:: A list of SensorInformation representing the 10 sensors, the reward and boolean for is_terminal.
Return type:: tuple

palaestrai.environment.EnvironmentConductor: Environment Lifecycle Management¶

class palaestrai.environment.EnvironmentConductor(env_cfg, seed: int, uid=None)[source]¶

Bases: object

The environment conductor creates new environment instances.

There could be multiple simulation runs and each would need a separate environment. The environment conductor controls the creation of those new environment instances.

Parameters:

env_cfg (dict) – Dictionary with parameters needed by the environment
seed (uuid4) – Random seed for recreation
uid (uuid4) – Unique identifier

async run()¶

Main event/state loop of the ESM

This run method is injected into monitored classes if they do not have one already. The structure of run is as follows:

It resets the handlers for SIGCHLD, SIGINT, and SIGTERM to the OS’ default.
It calls monitored.setup(), if it exists.
It creates an ESM instance for the monitored object and adds signal handlers for SIGCHLD, SIGINT, and SIGTERM according to what the monitored class defines (via @ESM.on(signal.SIGINT), etc.)
It transides to the first state, defined by @ESM.enter. It then waits for state changes/events until monitored.stop() is called.
Finally, once the main event/state loop concludes, monitored.teardown() is called (if present).

stop(error=None)¶

Stops the ESM.

Stopping the ESM also means shutting down all running processes and cancelling all outstanding tasks (e.g., request monitors).

Paramters¶

errorException: If given, the ESM will raise this after cleaning up.