ray.rllib.policy.policy.Policy.compute_actions_from_input_dict#

Computes actions from collected samples (across multiple-agents).

Takes an input dict (usually a SampleBatch) as its main data input. This allows for using this method in case a more complex input pattern (view requirements) is needed, for example when the Model requires the last n observations, the last m actions/rewards, or a combination of any of these.

Parameters:

input_dict – A SampleBatch or input dict containing the Tensors to compute actions. input_dict already abides to the Policy’s as well as the Model’s view requirements and can thus be passed to the Model as-is.
explore – Whether to pick an exploitation or exploration action (default: None -> use self.config[“explore”]).
timestep – The current (sampling) time step.
episodes – This provides access to all of the internal episodes’ state, which may be useful for model-based or multi-agent algorithms.

Keyword Arguments:

kwargs – Forward compatibility placeholder.

Returns:

Batch of output actions, with shape like: [BATCH_SIZE, ACTION_SHAPE].
state_outs: List of RNN state output: batches, if any, each with shape [BATCH_SIZE, STATE_SIZE].
info: Dictionary of extra feature batches, if any, with shape like: {“f1”: [BATCH_SIZE, …], “f2”: [BATCH_SIZE, …]}.

Return type:

actions