palaestrai.environment package#
palaestrai.environment.Environment: Environment Base Class#
- class palaestrai.environment.Environment(uid: str, broker_uri: str, seed: int)[source]#
Bases:
ABC
Abstract class for environment implementation
This abstract calls provides all necessary functions needed to implement a new environment. The developer only has to implement the functions start_environment and update.
- Parameters:
- reward#
If present, this method calculates the reward of the environment ( (“external reward”). See ::EnvironmentState.world_state.
- Type:
::Reward
- reset(request: EnvironmentResetRequest) EnvironmentResetResponse [source]#
Resets the environment in-place.
The default behavior for a reset comprises:
calling shutdown to allow a graceful shutdown of environment simulation processes
calling
start_environment()
againpreparing the
EnvironmentResetResponse
If an environment requires a more special reset procedure, this method can be overwritten.
- Parameters:
request (EnvironmentResetRequest) – The reset request send by the simulation controller.
- Returns:
The response for the simulation controller.
- Return type:
EnvironmentResetResponse
- async run()[source]#
Main execution loop for an environment object
This method takes care of the actual execution. As long as this method does not return, the environment is still active.
The method receives and processes incoming messages. It applies changes to itself, i.e., setpoints delivered via
EnvironmentUpdateRequest
objects. It subsequently takes care of sending the appropriate update responses (EnvironmentUpdateResponse()
).This method also interpretes
EnvironmentSetupRequest
andEnvironmentShutdownRequest
messages.
- shutdown(reset: bool = False) bool [source]#
Initiate the environment shutdown.
In this function the
is_terminal
is set to True, which leads to a break of the main loop in therun()
method.
- abstract start_environment() EnvironmentBaseline | Tuple[List[SensorInformation], List[ActuatorInformation]] [source]#
Launches execution of an environment.
If the environment uses a simulation tool, this function can be used to initiate the simulation tool. In addion, this function is used to prepare the environment for the simulation. It must be able to provide initial sensor information.
On a reset, this method is called to restart a new environment run. Therefore, it also must provide initial values for all variables used!
- Returns:
Union[EnvironmentBaseline,
typing.Tuple[List[SensorInformation], List[ActuatorInformation]]] – An
EnvironmentBaseline
object containing all initial data from the environment. For backwards compatibility, it is also possible (though deprecated) to return a tuple containing a list of available sensors and a list of available actuators.
- abstract update(actuators: List[ActuatorInformation]) EnvironmentState | Tuple[List[SensorInformation], List[RewardInformation], bool] [source]#
Function to update the environment
This function receives the agent’s actions and has to respond with new sensor information. This function should create a new simulation step.
- Parameters:
actuators (List[ActuatorInformation]) – List of actuators with values
- Returns:
Union[EnvironmentState,
typing.Tuple[List[SensorInformation], List[RewardInformation], bool]] – An
EnvironmentState
object; for backwards compatibility, environments can return a tuple containing a list of sensor readings (SensorInformation
), a list of rewards (RewardInformation
), and a flag whether the environment has terminated. Returning a tuple is considered deprecated.
- property worker#
Return the major domo worker.
The worker will be created if necessary.
palaestrai.environment.EnvironmentBaseline: Inital Environment Data#
- class palaestrai.environment.EnvironmentBaseline(sensors_available: List[SensorInformation], actuators_available: List[ActuatorInformation], simtime: SimTime = <factory>)[source]#
Bases:
object
An
Environment
’s baseline after initializingThis data class contains data about an environment after it has been started, but no actor has acted yet. It contains the sensors/actuator available, initial values for sensors, as well as the starting time in the environment.
- sensors_available#
Sensors available in the environment, along with initial readings
- Type:
List[SensorInformation]
- actuators_available#
Actuators available
- Type:
List[ActuatorInformation]
- simtime#
Environment starting time
- Type:
palaestrai.types.SimTime (default: SimTime(simtime_ticks=1))
palaestrai.environment.EnvironmentState: Current State of an Environment#
- class palaestrai.environment.EnvironmentState(sensor_information: List[SensorInformation], rewards: List[RewardInformation], done: bool, world_state: Any = None, simtime: SimTime | None = None)[source]#
Bases:
object
Describes the current state of an
Environment
.This dataclass is used as return value of the
update()
method. It contains current sensor readings, reward of the environment, indicates whether the environment has terminated or not, and finally gives time information.- sensor_information#
List of current sensor values after evaluating the environment
- Type:
List[SensorInformation]
- rewards#
Current rewards given from the environment
- Type:
List[RewardInformation]
- world_state#
Current state of the world (whatever the environment thinks it is)
- Type:
Any (default: None)
- simtime#
Environment starting time
- Type:
SimTime (default: None)
palaestrai.environment.DummyEnvironment: Minimal Working Dummy Environment#
- class palaestrai.environment.DummyEnvironment(uid: str, broker_uri: str, seed: int, discrete: bool = True)[source]#
Bases:
Environment
This class provides a dummy environment with a fixed number of sensors. The environment terminates after a fixed number of updates.
- Parameters:
connection (broker_connection) – the URI which is used to connect to the simulation broker. It is used to communicate with the simulation controller.
uid (uuid4) – a universal id for the environment
seed (int) – Seed for recreation
discrete (bool, optional) – If set to True, the environment will only use discrete spaces. Otherwise, the spaces are continuous. Default is True.
- start_environment()[source]#
This method is called when an EnvironmentStartRequest message is received. This dummy environment is represented by 10 sensors and 10 actuators. The sensors are of the type SensorInformation and have a random value of either 0 or 1, an observation_space between 0 and 1 and an integer number as id. The actuators are of the type ActuatorInformation and contain a value of Discrete(1), a space of None and an integer number as id.
- Returns:
A list containing the SensorInformation for each of the 10 sensors and a list containing the ActuatorInformation for each of the 10 actuators.
- Return type:
- update(actuators)[source]#
This method is called when an EnvironmentUpdateRequest message is received. While values of the actuators manipulate an actual environment, in here those values have no impact on the behavior of the dummy environment. The state of this dummy environment is represented via random values of the SensorInformation from the 10 sensors. In this dummy environment the reward for the state is a random value of either 0 or 1. The method returns a list of SensorInformation, the random reward and the boolean is_terminal. After 10 updates the is_terminal value is set to True which triggers the respective shutdown messages.
- Parameters:
actuators (list[ActuatorInformation]) – A list of ActuatorInformation to interact with the environment.
- Returns:
A list of SensorInformation representing the 10 sensors, the reward and boolean for is_terminal.
- Return type:
palaestrai.environment.EnvironmentConductor: Environment Lifecycle Management#
- class palaestrai.environment.EnvironmentConductor(env_cfg, seed: int, uid=None)[source]#
Bases:
object
The environment conductor creates new environment instances.
There could be multiple simulation runs and each would need a separate environment. The environment conductor controls the creation of those new environment instances.
- Parameters:
env_cfg (dict) – Dictionary with parameters needed by the environment
seed (uuid4) – Random seed for recreation
uid (uuid4) – Unique identifier
- async run()#
Main event/state loop of the ESM
This
run
method is injected into monitored classes if they do not have one already. The structure ofrun
is as follows:It resets the handlers for SIGCHLD, SIGINT, and SIGTERM to the OS’ default.
It calls
monitored.setup()
, if it exists.It creates an ESM instance for the monitored object and adds signal handlers for SIGCHLD, SIGINT, and SIGTERM according to what the monitored class defines (via
@ESM.on(signal.SIGINT)
, etc.)It transides to the first state, defined by
@ESM.enter
. It then waits for state changes/events untilmonitored.stop()
is called.Finally, once the main event/state loop concludes,
monitored.teardown()
is called (if present).
- stop()#
Stops the ESM.
Stopping the ESM also means shutting down all running processes and cancelling all outstanding tasks (e.g., request monitors).