Policy.compute_actions_from_input_dict(input_dict: SampleBatch | Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | dict | tuple], explore: bool | None = None, timestep: int | None = None, episodes: List[Episode] | None = None, **kwargs) Tuple[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor], Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor]][source]#

Computes actions from collected samples (across multiple-agents).

Takes an input dict (usually a SampleBatch) as its main data input. This allows for using this method in case a more complex input pattern (view requirements) is needed, for example when the Model requires the last n observations, the last m actions/rewards, or a combination of any of these.

  • input_dict – A SampleBatch or input dict containing the Tensors to compute actions. input_dict already abides to the Policy’s as well as the Model’s view requirements and can thus be passed to the Model as-is.

  • explore – Whether to pick an exploitation or exploration action (default: None -> use self.config[“explore”]).

  • timestep – The current (sampling) time step.

  • episodes – This provides access to all of the internal episodes’ state, which may be useful for model-based or multi-agent algorithms.

Keyword Arguments:

kwargs – Forward compatibility placeholder.


Batch of output actions, with shape like


state_outs: List of RNN state output

batches, if any, each with shape [BATCH_SIZE, STATE_SIZE].

info: Dictionary of extra feature batches, if any, with shape like

{“f1”: [BATCH_SIZE, …], “f2”: [BATCH_SIZE, …]}.

Return type: