--- jupytext: text_representation: extension: .md format_name: myst kernelspec: display_name: Python 3 language: python name: python3 --- (rllib-cli-doc)= # Working with the RLlib CLI RLlib is built in Python and if you're an advanced user, you will primarily use its Python API to build and run your experiments. But RLlib also comes with a command line interface (CLI)[^typer] that allows you to quickly run and evaluate experiments without having to write any code. You can also run pre-defined RLlib examples with it. This is useful for quickly testing out RLlib, or for running experiments that are simple enough that they don't require any custom code. ## Running your first example Let's say you want to run an RLlib experiment on the `CartPole` gym environment, using an example built for you by the RLlib team. ### Searching for examples The first thing you can to is filter the list of all available examples by the environment name you want to use: ```{note} You can run this command without the `--filter` option to see the full list of almost 100 examples that come with RLlib. ``` ```{raw} html
rllib example list --filter cartpole
``` This will return a list of Example IDs and their descriptions. You can pick any example you want, but for this tutorial we'll use the `PPO` algorithm and go with the example ID `cartpole-ppo`. ### Inspecting examples If you're interested in checking out the specification of this example, you can do so by typing the following CLI command: ```{raw} html
rllib example get cartpole-ppo
``` The output will be a prompt in the YAML format that contains the configuration of the example.[^formats] We're not going to go into the details of the configuration here, but you should know that it's a _tuned example_, meaning that you can expect it to train well out of the box. Here's the output of the above command (parsed as YAML for readability): ```{literalinclude} ../../../rllib/tuned_examples/ppo/cartpole-ppo.yaml :language: yaml ``` ```{note} Note that some of these tuned examples may require resource specifications, like multiple GPUs, that you might not have available on your machine. To "solve" some of the more advanced environments out there _efficiently_ and provide a tuned example for it, we sometimes have to use the full power of a cluster. After all, running on Ray Clusters is what RLlib was built for. ``` ### Running tuned examples Let's run the example next! After showing how to start the training run, we give you some sample output of it below. Note that by default, RLlib will create an indicative experiment name for you, and logs important metrics such as the `reward`, the `episode_reward_max`, or the `episode_reward_min`. ```{raw} html
rllib example run cartpole-ppo ... Number of trials: 1/1 (1 RUNNING) ... ... | Trial name | status | ... | reward | episode_reward_max | episode_reward_min | ... ... | PPO_CartPole-v0_9931e_00000 | RUNNING | ... | 23.9756 | 71 | 10 | ...
``` In the output of the training run, you should also see information about the machine (`loc`, host name and port), as well as the `status` of your training runs. If your run is `TERMINATED`, but you've never seen a successfully `RUNNING` experiment in the log, something must have gone wrong. ```text +-----------------------------+----------+----------------+ | Trial name | status | loc | |-----------------------------+----------+----------------| | PPO_CartPole-v0_9931e_00000 | RUNNING | 127.0.0.1:8683 | +-----------------------------+----------+----------------+ ``` Also, when the training run finishes, you should see the following output: ```text Your training finished. Best available checkpoint for each trial: /checkpoint_000008 You can now evaluate your trained algorithm from any checkpoint, e.g. by running: ╭────────────────────────────────────────────────────────────────────────────────────────────╮ │ rllib evaluate /checkpoint_000008 --algo PPO │ ╰────────────────────────────────────────────────────────────────────────────────────────────╯ ``` In other words, you can now evaluate your trained algorithm from this (or any other) checkpoint with the `rllib evaluate` command. We'll come back to this command later. This completes your first example with the RLlib CLI. You now know about the `list`, `get`, and `run` sub-commands, and how to use them to run _tuned_ examples with RLlib. ## Getting help You can request help for the RLlib CLI simply by adding `--help` at the end of any command, for instance by typing `rllib example --help`, or `rllib example run --help`. If you run the latter, you'll see the following output: ![rllib-example-run-help](./images/rllib-example-run-help.png) This output clearly shows you that the`run` sub-command has one mandatory argument, namely an `example_id`. Many other commands have many more arguments and options, so it can be a good idea to check out the help output for them occasionally. ## Starting training runs If you want to do more than just use the examples that come with RLlib, you can also run your own algorithm configurations with `rllib`. That's what you use the `rllib train` command for. ```{note} Internally the `rllib example run` command uses the same functionality as `rllib train`. The only difference is the usage of pre-defined configurations in the former. ``` There are two basic ways to run training. You can either provide all the necessary information via the command line, or simply point to a configuration file (like the YAML we've already seen in the previous section). Let's briefly go over both options, starting with the recommended one. ### Training from file To train from a configuration file, you can use the `rllib train file` sub-command. It works by either providing a path to a local file, or a URL to a remote file. Here's an example that uses one of the examples hosted in the Ray GitHub repository: ```{raw} html
rllib train file https://raw.githubusercontent.com/\ ray-project/ray/master/rllib/tuned_examples/\ ppo/cartpole_ppo_envrunner.py -t python
``` The `-t` or `--type` option is used to specify the type of the configuration file, in this case `python`, since we're using a Python file. This is what the Python configuration of this example looks like: ```{literalinclude} ../../../rllib/tuned_examples/ppo/cartpole_ppo_envrunner.py :language: python ``` You can define your own Python examples by creating a Python file that contains a variable called `config` of type `AlgorithmConfig` and an optional Python dictionary called `stop` that defines stopping conditions.[^tune] If you're interested in other configuration file types, you can check out the following drop-down. Apart from Python, you can also use YAML or JSON. :::{dropdown} Here's the same configuration of our example expressed in YAML and JSON :animate: fade-in-slide-down You can run this file with `rllib train file -t yaml`: ```{literalinclude} ../../../rllib/tuned_examples/ppo/cartpole-ppo.yaml :language: python ``` Currently, `yaml` is the default type and many of the tuned examples in the RLlib repository use it. So, if you're running a YAML file with `rllib train file`, you don't need to specify the `-t` option. ::: Training from a file is the recommended way to run training with RLlib, because you can just modify the file and run the same command again. For quick tests and experiments, however, you can directly provide the configuration via the command line. ### Training from configuration options By dropping the `file` sub-command, you can directly provide the configuration options of your experiment as command line arguments. For instance, to run training with PPO on a CartPole environment, you can use the following command directly: ```{raw} html
rllib train --algo PPO --env CartPole-v1 --config {}
``` The advantage is that you don't need to create a configuration file, but the downside is that your CLI command can get quite long and complicated. The `--config` option is used to provide a JSON string that contains the configuration that you would otherwise put in your YAML or JSON config files. The `train` command has many more options that we don't discuss in detail here, but you can use the following dropdown to read more about their definition: :::{dropdown} Click here to see all `train` command line options :animate: fade-in-slide-down The `train` command is fed the following Typer variables, most of them are optional: ```{literalinclude} ../../../rllib/common.py :language: python :start-after: __cli_common_start__ :end-before: __cli_common_end__ ``` ```{literalinclude} ../../../rllib/common.py :language: python :start-after: __cli_train_start__ :end-before: __cli_train_end__ ``` As you might infer, the set of options roughly splits into two categories, namely those that are used to configure the training run, and those that are used to configure your Ray cluster. ::: Probably the best way to get familiar with `rllib train` is to run `rllib train --help` and play with your own experiments. ## Evaluating your experiments Successful training runs generate _checkpoints_ whose performance you can evaluate with the `rllib evaluate` command. By default, checkpoints are generated in `~/ray_results//checkpoints/...`, and you can use can provide the full path to a checkpoint as argument to the evaluate command. ```{note} As you've seen earlier, every successful `rllib example run` generates an `rllib evaluate` command automatically for you from a checkpoint. The same holds true for `rllib train` runs. ``` If you've run an `rllib train` run locally, you could use such a checkpoint to evaluate its performance, but for testing purposes you can also evaluate an untrained algorithm by simply omitting the checkpoint argument: ```{raw} html
rllib evaluate --algo PPO --env CartPole-v1 Episode #10: reward: 19.1 Episode #20: reward: 30.5 Episode #30: reward: 19.8 Episode #40: reward: 31.2 Episode #50: reward: 23.0
``` Note that the `evaluate` command needs to know about the algorithm and environment you want to evaluate. The overall structure of the command itself is slightly simpler than its `train` counterpart, as there are simply fewer options to configure. :::{dropdown} See all `evaluation` command line options :animate: fade-in-slide-down ```{literalinclude} ../../../rllib/common.py :language: python :start-after: __cli_eval_start__ :end-before: __cli_eval_end__ ``` ::: ## Where to go from here? Now that you know the three basic commands of the RLlib CLI, namely `train`, `evaluate` and `example`, you have all the tools you need to run simple RLlib experiments from the command line. If you want to learn more about the capabilities of RLlib, you have to learn more about its [Python API](rllib-training-api) and how to [configure algorithms](rllib-algo-configuration) in the first place. [^typer]: The RLlib CLI is built with [Typer](https://typer.tiangolo.com/) and uses [Rich](https://rich.readthedocs.io/en/stable/introduction.html) text formatting for convenience. [^formats]: Many of the pre-defined examples are written in YAML, but you can also use JSON or Python files. The CLI will automatically detect the format of the file you're using. [^tune]: These stopping conditions are internally picked up by Ray Tune, which is used by RLlib under the hood.