Adding new Environments to palaestrAI#
Role of an Environment#
An environment is the world in which an agent lives. More precisely,
subclasses of the Environment
receive setpoints from
Muscle
objects. Environment classes are self-contained and
encapsule everything that is related to a world, with no knowledge of how
agents interact with it. An environment can be any code, from a simple
class that implements, e.g., Tic-Tac-Toe, to a complex interface to a
simulation tool.
Note
Muscles are the acting part of an agent; the Brain
is used for
training. To learn more about the Brain-Muscle Split, see the
quickstart guide.
Environments are responsible for maintaining their state, applying setpoints, and providing sensor data. Environments are responsbile for keeping their time; there is no time synchronisation between environments. An environment also signals when it is “done.” What the definition if “done” is, depends on the environment itself; this is domain-specific. The pole falling in the famous cartpole environment would mean “done” here, whereas the car crashing or finishing the race would be the definition of “done” in a racing environment.
The canonical place for many environment reference implementations is the palaestrai-environments package.
Implementing Environments: Methods to Provide#
When creating a new environment, one has to subclass Environment
.
This class provides two important methods that must be implemented:
start_environment()
: Launches the environment and takes care of any setup codeupdate()
: Applies values (viaActuatorInformation
) to an environment and returns new sensor readings (SensorInformation
) and reward values (RewardInformation
).
Additionally, one can reimplement the existing reset()
method if necessary. The default implementation simply calls
start_environment()
and does a bit of housekeeping.
Starting an Environment#
- abstract Environment.start_environment() EnvironmentBaseline | Tuple[List[SensorInformation], List[ActuatorInformation]] [source]
Launches execution of an environment.
If the environment uses a simulation tool, this function can be used to initiate the simulation tool. In addion, this function is used to prepare the environment for the simulation. It must be able to provide initial sensor information.
On a reset, this method is called to restart a new environment run. Therefore, it also must provide initial values for all variables used!
- Returns:
Union[EnvironmentBaseline,
typing.Tuple[List[SensorInformation], List[ActuatorInformation]]] – An
EnvironmentBaseline
object containing all initial data from the environment. For backwards compatibility, it is also possible (though deprecated) to return a tuple containing a list of available sensors and a list of available actuators.
Updating an Environment#
- abstract Environment.update(actuators: List[ActuatorInformation]) EnvironmentState | Tuple[List[SensorInformation], List[RewardInformation], bool] [source]
Function to update the environment
This function receives the agent’s actions and has to respond with new sensor information. This function should create a new simulation step.
- Parameters:
actuators (List[ActuatorInformation]) – List of actuators with values
- Returns:
Union[EnvironmentState,
typing.Tuple[List[SensorInformation], List[RewardInformation], bool]] – An
EnvironmentState
object; for backwards compatibility, environments can return a tuple containing a list of sensor readings (SensorInformation
), a list of rewards (RewardInformation
), and a flag whether the environment has terminated. Returning a tuple is considered deprecated.
Resetting an Environment (optional to implement)#
- Environment.reset(request: EnvironmentResetRequest) EnvironmentResetResponse [source]
Resets the environment in-place.
The default behavior for a reset comprises:
calling shutdown to allow a graceful shutdown of environment simulation processes
calling
start_environment()
againpreparing the
EnvironmentResetResponse
If an environment requires a more special reset procedure, this method can be overwritten.
- Parameters:
request (EnvironmentResetRequest) – The reset request send by the simulation controller.
- Returns:
The response for the simulation controller.
- Return type:
EnvironmentResetResponse
Developer Background: Lifecycle of an Environment#
Environments live in their own processes. Environment objects and their
processes are created by the SimulationController
. All other API
calls happen indirectly through messages. The life cycle of an environment is:
Environment is created by a
SimulationController
and transferred to a separate process.The environment receives a
EnvironmentStartRequest
. Via its answer (EnvrionmentStartResponse
), it delivers available sensors, actuators, and initial sensor readings (theEnvironmentBaseline
return value ofstart_environment()
).Repeatedly, until the environment is done or another termination condition is fulfilled, the
SimulationController
queries one (or all) agents for new setpoints, which are then provided to the environment. The environment updates its state and returns new sensor readings as well as reward informationA shutdown or reset ends the loop of (3).