Adding new Environments to palaestrAI#

Role of an Environment#

An environment is the world in which an agent lives. More precisely, subclasses of the Environment receive setpoints from Muscle objects. Environment classes are self-contained and encapsule everything that is related to a world, with no knowledge of how agents interact with it. An environment can be any code, from a simple class that implements, e.g., Tic-Tac-Toe, to a complex interface to a simulation tool.

Note

Muscles are the acting part of an agent; the Brain is used for training. To learn more about the Brain-Muscle Split, see the quickstart guide.

Environments are responsible for maintaining their state, applying setpoints, and providing sensor data. Environments are responsbile for keeping their time; there is no time synchronisation between environments. An environment also signals when it is “done.” What the definition if “done” is, depends on the environment itself; this is domain-specific. The pole falling in the famous cartpole environment would mean “done” here, whereas the car crashing or finishing the race would be the definition of “done” in a racing environment.

The canonical place for many environment reference implementations is the palaestrai-environments package.

Implementing Environments: Methods to Provide#

When creating a new environment, one has to subclass Environment. This class provides two important methods that must be implemented:

  1. start_environment(): Launches the environment and takes care of any setup code

  2. update(): Applies values (via ActuatorInformation) to an environment and returns new sensor readings (SensorInformation) and reward values (RewardInformation).

Additionally, one can reimplement the existing reset() method if necessary. The default implementation simply calls start_environment() and does a bit of housekeeping.

Starting an Environment#

abstract Environment.start_environment() EnvironmentBaseline | Tuple[List[SensorInformation], List[ActuatorInformation]][source]

Launches execution of an environment.

If the environment uses a simulation tool, this function can be used to initiate the simulation tool. In addion, this function is used to prepare the environment for the simulation. It must be able to provide initial sensor information.

On a reset, this method is called to restart a new environment run. Therefore, it also must provide initial values for all variables used!

Returns:

  • Union[EnvironmentBaseline,

  • typing.Tuple[List[SensorInformation], List[ActuatorInformation]]] – An EnvironmentBaseline object containing all initial data from the environment. For backwards compatibility, it is also possible (though deprecated) to return a tuple containing a list of available sensors and a list of available actuators.

Updating an Environment#

abstract Environment.update(actuators: List[ActuatorInformation]) EnvironmentState | Tuple[List[SensorInformation], List[RewardInformation], bool][source]

Function to update the environment

This function receives the agent’s actions and has to respond with new sensor information. This function should create a new simulation step.

Parameters:

actuators (List[ActuatorInformation]) – List of actuators with values

Returns:

  • Union[EnvironmentState,

  • typing.Tuple[List[SensorInformation], List[RewardInformation], bool]] – An EnvironmentState object; for backwards compatibility, environments can return a tuple containing a list of sensor readings (SensorInformation), a list of rewards (RewardInformation), and a flag whether the environment has terminated. Returning a tuple is considered deprecated.

Resetting an Environment (optional to implement)#

Environment.reset(request: EnvironmentResetRequest) EnvironmentResetResponse[source]

Resets the environment in-place.

The default behavior for a reset comprises:

  • calling shutdown to allow a graceful shutdown of environment simulation processes

  • calling start_environment() again

  • preparing the EnvironmentResetResponse

If an environment requires a more special reset procedure, this method can be overwritten.

Parameters:

request (EnvironmentResetRequest) – The reset request send by the simulation controller.

Returns:

The response for the simulation controller.

Return type:

EnvironmentResetResponse

Developer Background: Lifecycle of an Environment#

Environments live in their own processes. Environment objects and their processes are created by the SimulationController. All other API calls happen indirectly through messages. The life cycle of an environment is:

  1. Environment is created by a SimulationController and transferred to a separate process.

  2. The environment receives a EnvironmentStartRequest. Via its answer (EnvrionmentStartResponse), it delivers available sensors, actuators, and initial sensor readings (the EnvironmentBaseline return value of start_environment()).

  3. Repeatedly, until the environment is done or another termination condition is fulfilled, the SimulationController queries one (or all) agents for new setpoints, which are then provided to the environment. The environment updates its state and returns new sensor readings as well as reward information

  4. A shutdown or reset ends the loop of (3).

_images/environment-lifecycle.svg