Memory#

The Memory is a Class used in the:class:~Brain to store information. The basic Memory e.g., stores all received sensor information, all actuator values set by the agent, and the reward returned by the environment(s). Additionally, it stores the internal reward calculated by the objective.

Memory as Replay Buffer#

The Memory can be used as a replay buffer. The base class can be extended by additional functions and attributes if needed.

How to use a specific memory#

Since Memory is initialized in the Brain, memory can be defined as a parameter of the Brain. For this, the path to the Memory class is passed to the params of the Brain class. Additionally, the parameters for the Memory can be defined. For example:

name: mighty_defender
 brain:
  name: palaestrai.agent.dummy_brain:DummyBrain
  params:
    params:
      memory_class: palaestrai.agent.memory:Memory
      memory_params: {}

Warning

The base lists (env_rewards, actions, observations, internal_rewards) are automatically filled in the brain. Do not add values to it, this could result in duplicates. You can manipulate existing values if needed or add values which are not part of the MuscleUpdateRequest. The only exception is the additional data which have to be implemented individually.

API Documentation#

Memory#

class palaestrai.agent.Memory(size_limit: int = 1000000)[source]

An in-memory data structure to store experinences in a ::~Brain.

Each agent needs a memory to store experiences, regardless of the training algorithm that is used. This class represents this memory. It is an in-memory data strcture that uses pandas DataFrames for its public API. The memory stores observations, actions, rewards given from the envrionment, and the internal reward of the agent (objective value). The memory is passed to an Objective to calculate the objective value from rewards.

Parameters:: size_limit (int = 1e6) – Maximum size the memory is allowed to grow to until old entries are overwritten by new ones.

Stores a new item in the agent’s memory (append)

An agent has experiences throughout its existence. The memory stores those by appending them. The memory stores at least those pieces of information that come from an environment, which are:

sensor readings
actuator setpoints (as issued by the agent)
rewards
whether the simulation has terminated (is “done”)

Readings, setpoints, and rewards are stored in their palaestrAI-native objects: SensorInformation, ActuatorInformation, and RewardInformation. Additionally, an agent (i.e., its muscle) may store its own view in terms of transformed values.

Parameters:

muscle_uid (str) – UID of the agent (Muscle) whose experiences we store
sensor_readings (List[SensorInformation]) – A muscle’s sensor readings as provided by the environment
actuator_setpoints (List[ActuatorInformation]) – A muscle’s setpoints as provided to an environment
rewards (List[RewardInformation]) – Rewards issued by the environment. It is not necessary that sensor readings, setpoints, and rewards belong to the same time step; usually, rewards at a time step t belong to the sensor readings and actions from t-1. This memory class correctly correlates rewards to the previous readings/actions.
done (bool = False) – Whether this was the last action executed in the environment
observations (Optional[np.ndarray] = None) – Observations the Muscle wants to share with its Brain, e.g., transformed/scaled values
actions (Optional[np.ndarray] = None,) – Action-related data a Muscle emitted, such as probabilities, or other data. Can be fed directly to the corresponding Brain, as with observations
objective (Optional[np.ndarray] = None) – The agent’s objective value describing its own goal. Optional, because the agent might calculate such a value separately.
additional_data (Optional[Dict] = None) – Any additional data a Muscle wants to store

property tags: Set[str]: All tags known to this memory

tail(n=1)[source]

Returns the n last full entries

This method returns a nested data frame that returns the n last entries from the memory. This method constructs a multi-indexed data frame, i.e., a dataframe that contains other dataframes. You access each value through the hierarchy, e.g.,

df = memory.tail(10) df.observations.uid.iloc[-1]

Parameters:: n (int = 1) – How many data items to return, counted from the latest addition. Defaults to 1.
Returns:: A dataclass that contains the n last full entries, i.e., all entries where the (observations, actions, rewards, objective) quadruplet is fully set. I.e., you can be sure that the all indexes correspond to each other, and that calling iloc with an index really gives you the n-th observation, action, and reward for it. However, if for whatever reason the environment returned an empty reward, this will also be included. This is in contrast to the ::~.sample method, which will return only entries with where an associated reward is also present.
Return type:: MemoryShard

truncate(n: int)[source]

Truncates the memory: Only the last n entries are retained.

Parameters:: n (int) – How many of the most recent entries should be retained. Negative values of n are treated as abs(n).