Policy class contains functionality to compute
actions for decision making in an environment, as well as computing loss(es) and gradients,
updating a neural network model as well as postprocessing a collected environment trajectory.
One or more
Policy objects sit inside a
are - if more than one - are selected based on a multi-agent
which maps agent IDs to a policy ID.