- Policy.compute_actions_from_input_dict(input_dict: SampleBatch | Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | dict | tuple], explore: bool | None = None, timestep: int | None = None, episodes: List[Episode] | None = None, **kwargs) Tuple[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor], Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor]] [source]#
Computes actions from collected samples (across multiple-agents).
Takes an input dict (usually a SampleBatch) as its main data input. This allows for using this method in case a more complex input pattern (view requirements) is needed, for example when the Model requires the last n observations, the last m actions/rewards, or a combination of any of these.
- Parameters:
input_dict – A SampleBatch or input dict containing the Tensors to compute actions.
already abides to the Policy’s as well as the Model’s view requirements and can thus be passed to the Model as-is.explore – Whether to pick an exploitation or exploration action (default: None -> use self.config[“explore”]).
timestep – The current (sampling) time step.
episodes – This provides access to all of the internal episodes’ state, which may be useful for model-based or multi-agent algorithms.
- Keyword Arguments:
kwargs – Forward compatibility placeholder.
- Returns:
- Batch of output actions, with shape like
- state_outs: List of RNN state output
batches, if any, each with shape [BATCH_SIZE, STATE_SIZE].
- info: Dictionary of extra feature batches, if any, with shape like
{“f1”: [BATCH_SIZE, …], “f2”: [BATCH_SIZE, …]}.
- Return type: