Training Operations Utilities
Training Operations Utilities#
- ray.rllib.execution.train_ops.train_one_step(algorithm, train_batch, policies_to_train=None) Dict [source]#
Function that improves the all policies in
train_batch
on the local worker.Examples
>>> from ray.rllib.execution.rollout_ops import synchronous_parallel_sample >>> algo = [...] >>> train_batch = synchronous_parallel_sample(algo.workers) >>> # This trains the policy on one batch. >>> results = train_one_step(algo, train_batch)) {"default_policy": ...}
Updates the NUM_ENV_STEPS_TRAINED and NUM_AGENT_STEPS_TRAINED counters as well as the LEARN_ON_BATCH_TIMER timer of the
algorithm
object.
- ray.rllib.execution.train_ops.multi_gpu_train_one_step(algorithm, train_batch) Dict [source]#
Multi-GPU version of train_one_step.
Uses the policies’
load_batch_into_buffer
andlearn_on_loaded_batch
methods to be more efficient wrt CPU/GPU data transfers. For example, when doing multiple passes through a train batch (e.g. for PPO) usingconfig.num_sgd_iter
, the actual train batch is only split once and loaded once into the GPU(s).Examples
>>> from ray.rllib.execution.rollout_ops import synchronous_parallel_sample >>> algo = [...] >>> train_batch = synchronous_parallel_sample(algo.workers) >>> # This trains the policy on one batch. >>> results = multi_gpu_train_one_step(algo, train_batch)) {"default_policy": ...}
Updates the NUM_ENV_STEPS_TRAINED and NUM_AGENT_STEPS_TRAINED counters as well as the LOAD_BATCH_TIMER and LEARN_ON_BATCH_TIMER timers of the Algorithm instance.