ray.rllib.algorithms.algorithm.Algorithm.compute_single_action#
- Algorithm.compute_single_action(observation: numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | dict | tuple | None = None, state: List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | dict | tuple] | None = None, *, prev_action: numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | dict | tuple | None = None, prev_reward: float | None = None, info: dict | None = None, input_dict: SampleBatch | None = None, policy_id: str = 'default_policy', full_fetch: bool = False, explore: bool | None = None, timestep: int | None = None, episode=None, unsquash_action: bool | None = None, clip_action: bool | None = None, **kwargs) numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | dict | tuple | Tuple[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | dict | tuple, List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor], Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor]] [source]#
Computes an action for the specified policy on the local worker.
Note that you can also access the policy object through self.get_policy(policy_id) and call compute_single_action() on it directly.
- Parameters:
observation – Single (unbatched) observation from the environment.
state – List of all RNN hidden (single, unbatched) state tensors.
prev_action – Single (unbatched) previous action value.
prev_reward – Single (unbatched) previous reward value.
info – Env info dict, if any.
input_dict – An optional SampleBatch that holds all the values for: obs, state, prev_action, and prev_reward, plus maybe custom defined views of the current env trajectory. Note that only one of
obs
orinput_dict
must be non-None.policy_id – Policy to query (only applies to multi-agent). Default: “default_policy”.
full_fetch – Whether to return extra action fetch results. This is always set to True if
state
is specified.explore – Whether to apply exploration to the action. Default: None -> use self.config.explore.
timestep – The current (sampling) time step.
episode – This provides access to all of the internal episodes’ state, which may be useful for model-based or multi-agent algorithms.
unsquash_action – Should actions be unsquashed according to the env’s/Policy’s action space? If None, use the value of self.config.normalize_actions.
clip_action – Should actions be clipped according to the env’s/Policy’s action space? If None, use the value of self.config.clip_actions.
- Keyword Arguments:
kwargs – forward compatibility placeholder
- Returns:
The computed action if full_fetch=False, or a tuple of a) the full output of policy.compute_actions() if full_fetch=True or we have an RNN-based Policy.
- Raises:
KeyError – If the
policy_id
cannot be found in this Algorithm’s local worker.