Memory#
The Memory
is a Class used in the:class:~Brain to store information. The basic Memory
e.g., stores all received sensor information,
all actuator values set by the agent, and the reward returned by the environment(s). Additionally, it stores the internal
reward calculated by the objective.
Memory as Replay Buffer#
The Memory
can be used as a replay buffer. The base class can be extended by additional functions and attributes if needed.
How to use a specific memory#
Since Memory
is initialized in the Brain
, memory can be defined as a parameter of the Brain
. For this, the path to the Memory
class is passed to the params of the
Brain
class. Additionally, the parameters for the Memory
can be defined. For example:
name: mighty_defender
brain:
name: palaestrai.agent.dummy_brain:DummyBrain
params:
params:
memory_class: palaestrai.agent.memory:Memory
memory_params: {}
Warning
The base lists (env_rewards, actions, observations, internal_rewards) are automatically filled in the brain. Do not add values to it, this could result in duplicates. You can manipulate existing values if needed or add
values which are not part of the MuscleUpdateRequest
. The only exception is the additional data which have to
be implemented individually.
API Documentation#
Memory#
- class palaestrai.agent.Memory(size_limit: int = 1000000)[source]
An in-memory data structure to store experinences in a ::~Brain.
Each agent needs a memory to store experiences, regardless of the training algorithm that is used. This class represents this memory. It is an in-memory data strcture that uses pandas DataFrames for its public API. The memory stores observations, actions, rewards given from the envrionment, and the internal reward of the agent (objective value). The memory is passed to an
Objective
to calculate the objective value from rewards.- Parameters:
size_limit (int = 1e6) – Maximum size the memory is allowed to grow to until old entries are overwritten by new ones.
- append(muscle_uid: str, sensor_readings: List[SensorInformation] | None = None, actuator_setpoints: List[ActuatorInformation] | None = None, rewards: List[RewardInformation] | None = None, done: bool | None = None, observations: ndarray | None = None, actions: ndarray | None = None, objective: ndarray | None = None, additional_data: Dict | None = None)[source]
Stores a new item in the agent’s memory (append)
An agent has experiences throughout its existence. The memory stores those by appending them. The memory stores at least those pieces of information that come from an environment, which are:
sensor readings
actuator setpoints (as issued by the agent)
rewards
whether the simulation has terminated (is “done”)
Readings, setpoints, and rewards are stored in their palaestrAI-native objects:
SensorInformation
,ActuatorInformation
, andRewardInformation
. Additionally, an agent (i.e., its muscle) may store its own view in terms of transformed values.- Parameters:
muscle_uid (str) – UID of the agent (
Muscle
) whose experiences we storesensor_readings (List[SensorInformation]) – A muscle’s sensor readings as provided by the environment
actuator_setpoints (List[ActuatorInformation]) – A muscle’s setpoints as provided to an environment
rewards (List[RewardInformation]) – Rewards issued by the environment. It is not necessary that sensor readings, setpoints, and rewards belong to the same time step; usually, rewards at a time step
t
belong to the sensor readings and actions fromt-1
. This memory class correctly correlates rewards to the previous readings/actions.done (bool = False) – Whether this was the last action executed in the environment
observations (Optional[np.ndarray] = None) – Observations the
Muscle
wants to share with itsBrain
, e.g., transformed/scaled valuesactions (Optional[np.ndarray] = None,) – Action-related data a
Muscle
emitted, such as probabilities, or other data. Can be fed directly to the correspondingBrain
, as withobservations
objective (Optional[np.ndarray] = None) – The agent’s objective value describing its own goal. Optional, because the agent might calculate such a value separately.
additional_data (Optional[Dict] = None) – Any additional data a
Muscle
wants to store
- tail(n=1)[source]
Returns the n last full entries
This method returns a nested data frame that returns the n last entries from the memory. This method constructs a multi-indexed data frame, i.e., a dataframe that contains other dataframes. You access each value through the hierarchy, e.g.,
df = memory.tail(10) df.observations.uid.iloc[-1]
- Parameters:
n (int = 1) – How many data items to return, counted from the latest addition. Defaults to 1.
- Returns:
A dataclass that contains the n last full entries, i.e., all entries where the (observations, actions, rewards, objective) quadruplet is fully set. I.e., you can be sure that the all indexes correspond to each other, and that calling
iloc
with an index really gives you the n-th observation, action, and reward for it. However, if for whatever reason the environment returned an empty reward, this will also be included. This is in contrast to the ::~.sample method, which will return only entries with where an associated reward is also present.- Return type:
MemoryShard