ray.rllib.execution.train_ops.train_one_step#
- ray.rllib.execution.train_ops.train_one_step(algorithm, train_batch, policies_to_train=None) Dict [source]#
Function that improves the all policies in
train_batch
on the local worker.from ray.rllib.execution.rollout_ops import synchronous_parallel_sample algo = [...] train_batch = synchronous_parallel_sample(algo.env_runner_group) # This trains the policy on one batch. print(train_one_step(algo, train_batch)))
{"default_policy": ...}
Updates the NUM_ENV_STEPS_TRAINED and NUM_AGENT_STEPS_TRAINED counters as well as the LEARN_ON_BATCH_TIMER timer of the
algorithm
object.