Algorithm.compute_single_action(observation: Optional[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple]] = None, state: Optional[List[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple]]] = None, *, prev_action: Optional[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple]] = None, prev_reward: Optional[float] = None, info: Optional[dict] = None, input_dict: Optional[ray.rllib.policy.sample_batch.SampleBatch] = None, policy_id: str = 'default_policy', full_fetch: bool = False, explore: Optional[bool] = None, timestep: Optional[int] = None, episode: Optional[ray.rllib.evaluation.episode.Episode] = None, unsquash_action: Optional[bool] = None, clip_action: Optional[bool] = None, **kwargs) Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple, Tuple[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple], List[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor]], Dict[str, Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor]]]][source]#

Computes an action for the specified policy on the local worker.

Note that you can also access the policy object through self.get_policy(policy_id) and call compute_single_action() on it directly.

  • observation – Single (unbatched) observation from the environment.

  • state – List of all RNN hidden (single, unbatched) state tensors.

  • prev_action – Single (unbatched) previous action value.

  • prev_reward – Single (unbatched) previous reward value.

  • info – Env info dict, if any.

  • input_dict – An optional SampleBatch that holds all the values for: obs, state, prev_action, and prev_reward, plus maybe custom defined views of the current env trajectory. Note that only one of obs or input_dict must be non-None.

  • policy_id – Policy to query (only applies to multi-agent). Default: “default_policy”.

  • full_fetch – Whether to return extra action fetch results. This is always set to True if state is specified.

  • explore – Whether to apply exploration to the action. Default: None -> use self.config.explore.

  • timestep – The current (sampling) time step.

  • episode – This provides access to all of the internal episodes’ state, which may be useful for model-based or multi-agent algorithms.

  • unsquash_action – Should actions be unsquashed according to the env’s/Policy’s action space? If None, use the value of self.config.normalize_actions.

  • clip_action – Should actions be clipped according to the env’s/Policy’s action space? If None, use the value of self.config.clip_actions.

Keyword Arguments

kwargs – forward compatibility placeholder


The computed action if full_fetch=False, or a tuple of a) the full output of policy.compute_actions() if full_fetch=True or we have an RNN-based Policy.


KeyError – If the policy_id cannot be found in this Algorithm’s local worker.