ray.rllib.policy.policy.Policy.compute_actions_from_input_dict#
- Policy.compute_actions_from_input_dict(input_dict: SampleBatch | Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | dict | tuple], explore: bool | None = None, timestep: int | None = None, episodes: List[Episode] | None = None, **kwargs) Tuple[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor], Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor]] [source]#
Computes actions from collected samples (across multiple-agents).
Takes an input dict (usually a SampleBatch) as its main data input. This allows for using this method in case a more complex input pattern (view requirements) is needed, for example when the Model requires the last n observations, the last m actions/rewards, or a combination of any of these.
- Parameters:
input_dict – A SampleBatch or input dict containing the Tensors to compute actions.
input_dict
already abides to the Policy’s as well as the Model’s view requirements and can thus be passed to the Model as-is.explore – Whether to pick an exploitation or exploration action (default: None -> use self.config[“explore”]).
timestep – The current (sampling) time step.
episodes – This provides access to all of the internal episodes’ state, which may be useful for model-based or multi-agent algorithms.
- Keyword Arguments:
kwargs – Forward compatibility placeholder.
- Returns:
- Batch of output actions, with shape like
[BATCH_SIZE, ACTION_SHAPE].
- state_outs: List of RNN state output
batches, if any, each with shape [BATCH_SIZE, STATE_SIZE].
- info: Dictionary of extra feature batches, if any, with shape like
{“f1”: [BATCH_SIZE, …], “f2”: [BATCH_SIZE, …]}.
- Return type:
actions