ray.rllib.policy.policy.Policy.compute_actions#
- abstract Policy.compute_actions(obs_batch: List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | dict | tuple] | numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | dict | tuple, state_batches: List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor] | None = None, prev_action_batch: List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | dict | tuple] | numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | dict | tuple = None, prev_reward_batch: List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | dict | tuple] | numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | dict | tuple = None, info_batch: Dict[str, list] | None = None, episodes: List[Episode] | None = None, explore: bool | None = None, timestep: int | None = None, **kwargs) Tuple[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor], Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor]] [source]#
Computes actions for the current policy.
- Parameters:
obs_batch – Batch of observations.
state_batches – List of RNN state input batches, if any.
prev_action_batch – Batch of previous action values.
prev_reward_batch – Batch of previous rewards.
info_batch – Batch of info objects.
episodes – List of Episode objects, one for each obs in obs_batch. This provides access to all of the internal episode state, which may be useful for model-based or multi-agent algorithms.
explore – Whether to pick an exploitation or exploration action. Set to None (default) for using the value of
self.config["explore"]
.timestep – The current (sampling) time step.
- Keyword Arguments:
kwargs – Forward compatibility placeholder
- Returns:
- Batch of output actions, with shape like
[BATCH_SIZE, ACTION_SHAPE].
- state_outs (List[TensorType]): List of RNN state output
batches, if any, each with shape [BATCH_SIZE, STATE_SIZE].
- info (List[dict]): Dictionary of extra feature batches, if any,
with shape like {“f1”: [BATCH_SIZE, …], “f2”: [BATCH_SIZE, …]}.
- Return type:
actions