ray.rllib.core.learner.learner_group.LearnerGroup.update_from_episodes#

LearnerGroup.update_from_episodes(episodes: ~typing.List[SingleAgentEpisode | MultiAgentEpisode], *, async_update: bool = False, reduce_fn: ~typing.Callable[[~typing.List[~typing.Dict[str, ~typing.Any]]], dict | NestedDict] | None = <function _reduce_mean_results>, minibatch_size: int | None = None, num_iters: int = 1) Dict[str, Any] | List[Dict[str, Any]] | List[List[Dict[str, Any]]][source]#

Performs gradient based update(s) on the Learner(s), based on given episodes.

Parameters:
  • episodes – A list of Episodes to process and perform the update for. If there are more than one Learner workers, the list of episodes is split amongst these and one list shard is sent to each Learner.

  • async_update – Whether the update request(s) to the Learner workers should be sent asynchronously. If True, will return NOT the results from the update on the given data, but all results from prior asynchronous update requests that have not been returned thus far.

  • minibatch_size – The minibatch size to use for the update.

  • num_iters – The number of complete passes over all the sub-batches in the input multi-agent batch.

  • reduce_fn – An optional callable to reduce the results from a list of the Learner actors into a single result. This can be any arbitrary function that takes a list of dictionaries and returns a single dictionary. For example, you can either take an average (default) or concatenate the results (for example for metrics) or be more selective about you want to report back to the algorithm’s training_step. If None is passed, the results will not get reduced.

Returns:

If async_update is False, a dictionary with the reduced results of the updates from the Learner(s) or a list of dictionaries of results from the updates from the Learner(s). If async_update is True, a list of list of dictionaries of results, where the outer list corresponds to separate previous calls to this method, and the inner list corresponds to the results from each Learner(s). Or if the results are reduced, a list of dictionaries of the reduced results from each call to async_update that is ready.