ray.rllib.policy.policy.Policy.compute_actions#

Computes actions for the current policy.

Parameters:

obs_batch – Batch of observations.
state_batches – List of RNN state input batches, if any.
prev_action_batch – Batch of previous action values.
prev_reward_batch – Batch of previous rewards.
info_batch – Batch of info objects.
episodes – List of Episode objects, one for each obs in obs_batch. This provides access to all of the internal episode state, which may be useful for model-based or multi-agent algorithms.
explore – Whether to pick an exploitation or exploration action. Set to None (default) for using the value of self.config["explore"].
timestep – The current (sampling) time step.

Keyword Arguments:

kwargs – Forward compatibility placeholder

Returns:

Batch of output actions, with shape like: [BATCH_SIZE, ACTION_SHAPE].
state_outs (List[TensorType]): List of RNN state output: batches, if any, each with shape [BATCH_SIZE, STATE_SIZE].
info (List[dict]): Dictionary of extra feature batches, if any,: with shape like {“f1”: [BATCH_SIZE, …], “f2”: [BATCH_SIZE, …]}.

Return type:

actions