ray.rllib.policy.policy.Policy.compute_actions
ray.rllib.policy.policy.Policy.compute_actions#
- abstract Policy.compute_actions(obs_batch: Union[List[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple]], numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple], state_batches: Optional[List[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor]]] = None, prev_action_batch: Union[List[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple]], numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple] = None, prev_reward_batch: Union[List[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple]], numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple] = None, info_batch: Optional[Dict[str, list]] = None, episodes: Optional[List[Episode]] = None, explore: Optional[bool] = None, timestep: Optional[int] = None, **kwargs) Tuple[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor], List[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor]], Dict[str, Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor]]] [source]#
Computes actions for the current policy.
- Parameters
obs_batch – Batch of observations.
state_batches – List of RNN state input batches, if any.
prev_action_batch – Batch of previous action values.
prev_reward_batch – Batch of previous rewards.
info_batch – Batch of info objects.
episodes – List of Episode objects, one for each obs in obs_batch. This provides access to all of the internal episode state, which may be useful for model-based or multi-agent algorithms.
explore – Whether to pick an exploitation or exploration action. Set to None (default) for using the value of
self.config["explore"]
.timestep – The current (sampling) time step.
- Keyword Arguments
kwargs – Forward compatibility placeholder
- Returns
- Batch of output actions, with shape like
[BATCH_SIZE, ACTION_SHAPE].
- state_outs (List[TensorType]): List of RNN state output
batches, if any, each with shape [BATCH_SIZE, STATE_SIZE].
- info (List[dict]): Dictionary of extra feature batches, if any,
with shape like {“f1”: [BATCH_SIZE, …], “f2”: [BATCH_SIZE, …]}.
- Return type
actions