ray.rllib.policy.policy.Policy.compute_actions_from_input_dict
ray.rllib.policy.policy.Policy.compute_actions_from_input_dict#
- Policy.compute_actions_from_input_dict(input_dict: Union[ray.rllib.policy.sample_batch.SampleBatch, Dict[str, Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple]]], explore: Optional[bool] = None, timestep: Optional[int] = None, episodes: Optional[List[Episode]] = None, **kwargs) Tuple[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor], List[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor]], Dict[str, Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor]]] [source]#
Computes actions from collected samples (across multiple-agents).
Takes an input dict (usually a SampleBatch) as its main data input. This allows for using this method in case a more complex input pattern (view requirements) is needed, for example when the Model requires the last n observations, the last m actions/rewards, or a combination of any of these.
- Parameters
input_dict – A SampleBatch or input dict containing the Tensors to compute actions.
input_dict
already abides to the Policy’s as well as the Model’s view requirements and can thus be passed to the Model as-is.explore – Whether to pick an exploitation or exploration action (default: None -> use self.config[“explore”]).
timestep – The current (sampling) time step.
episodes – This provides access to all of the internal episodes’ state, which may be useful for model-based or multi-agent algorithms.
- Keyword Arguments
kwargs – Forward compatibility placeholder.
- Returns
- Batch of output actions, with shape like
[BATCH_SIZE, ACTION_SHAPE].
- state_outs: List of RNN state output
batches, if any, each with shape [BATCH_SIZE, STATE_SIZE].
- info: Dictionary of extra feature batches, if any, with shape like
{“f1”: [BATCH_SIZE, …], “f2”: [BATCH_SIZE, …]}.
- Return type
actions