Policy.compute_actions_from_input_dict(input_dict: Union[ray.rllib.policy.sample_batch.SampleBatch, Dict[str, Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple]]], explore: Optional[bool] = None, timestep: Optional[int] = None, episodes: Optional[List[Episode]] = None, **kwargs) Tuple[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor], List[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor]], Dict[str, Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor]]][source]#

Computes actions from collected samples (across multiple-agents).

Takes an input dict (usually a SampleBatch) as its main data input. This allows for using this method in case a more complex input pattern (view requirements) is needed, for example when the Model requires the last n observations, the last m actions/rewards, or a combination of any of these.

  • input_dict – A SampleBatch or input dict containing the Tensors to compute actions. input_dict already abides to the Policy’s as well as the Model’s view requirements and can thus be passed to the Model as-is.

  • explore – Whether to pick an exploitation or exploration action (default: None -> use self.config[“explore”]).

  • timestep – The current (sampling) time step.

  • episodes – This provides access to all of the internal episodes’ state, which may be useful for model-based or multi-agent algorithms.

Keyword Arguments

kwargs – Forward compatibility placeholder.


Batch of output actions, with shape like


state_outs: List of RNN state output

batches, if any, each with shape [BATCH_SIZE, STATE_SIZE].

info: Dictionary of extra feature batches, if any, with shape like

{“f1”: [BATCH_SIZE, …], “f2”: [BATCH_SIZE, …]}.

Return type