ray.rllib.utils.replay_buffers.utils.update_priorities_in_replay_buffer(replay_buffer: ReplayBuffer, config: dict, train_batch: SampleBatch | MultiAgentBatch | Dict[str, Any], train_results: Dict) None[source]#

Updates the priorities in a prioritized replay buffer, given training results.

The abs(TD-error) from the loss (inside train_results) is used as new priorities for the row-indices that were sampled for the train batch.

Don’t do anything if the given buffer does not support prioritized replay.

  • replay_buffer – The replay buffer, whose priority values to update. This may also be a buffer that does not support priorities.

  • config – The Algorithm’s config dict.

  • train_batch – The batch used for the training update.

  • train_results – A train results dict, generated by e.g. the train_one_step() utility.