{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "ecad719c", "metadata": {}, "source": [ "# Using Weights & Biases with Tune\n", "\n", "(tune-wandb-ref)=\n", "\n", "[Weights & Biases](https://www.wandb.ai/) (Wandb) is a tool for experiment\n", "tracking, model optimizaton, and dataset versioning. It is very popular\n", "in the machine learning and data science community for its superb visualization\n", "tools.\n", "\n", "```{image} /images/wandb_logo_full.png\n", ":align: center\n", ":alt: Weights & Biases\n", ":height: 80px\n", ":target: https://www.wandb.ai/\n", "```\n", "\n", "Ray Tune currently offers two lightweight integrations for Weights & Biases.\n", "One is the {ref}`WandbLoggerCallback `, which automatically logs\n", "metrics reported to Tune to the Wandb API.\n", "\n", "The other one is the {ref}`setup_wandb() ` function, which can be\n", "used with the function API. It automatically\n", "initializes the Wandb API with Tune's training information. You can just use the\n", "Wandb API like you would normally do, e.g. using `wandb.log()` to log your training\n", "process.\n", "\n", "```{contents}\n", ":backlinks: none\n", ":local: true\n", "```\n", "\n", "## Running A Weights & Biases Example\n", "\n", "In the following example we're going to use both of the above methods, namely the `WandbLoggerCallback` and\n", "the `setup_wandb` function to log metrics.\n", "\n", "As the very first step, make sure you're logged in into wandb on all machines you're running your training on:\n", "\n", " wandb login\n", "\n", "We can then start with a few crucial imports:" ] }, { "cell_type": "code", "execution_count": 1, "id": "100bcf8a", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "import ray\n", "from ray import train, tune\n", "from ray.air.integrations.wandb import WandbLoggerCallback, setup_wandb\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9346c0f6", "metadata": {}, "source": [ "Next, let's define an easy `train_function` function (a Tune `Trainable`) that reports a random loss to Tune.\n", "The objective function itself is not important for this example, since we want to focus on the Weights & Biases\n", "integration primarily." ] }, { "cell_type": "code", "execution_count": 2, "id": "e8b4fc4d", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "def train_function(config):\n", " for i in range(30):\n", " loss = config[\"mean\"] + config[\"sd\"] * np.random.randn()\n", " train.report({\"loss\": loss})\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "831eed42", "metadata": {}, "source": [ "You can define a\n", "simple grid-search Tune run using the `WandbLoggerCallback` as follows:" ] }, { "cell_type": "code", "execution_count": 3, "id": "52988599", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "def tune_with_callback():\n", " \"\"\"Example for using a WandbLoggerCallback with the function API\"\"\"\n", " tuner = tune.Tuner(\n", " train_function,\n", " tune_config=tune.TuneConfig(\n", " metric=\"loss\",\n", " mode=\"min\",\n", " ),\n", " run_config=train.RunConfig(\n", " callbacks=[WandbLoggerCallback(project=\"Wandb_example\")]\n", " ),\n", " param_space={\n", " \"mean\": tune.grid_search([1, 2, 3, 4, 5]),\n", " \"sd\": tune.uniform(0.2, 0.8),\n", " },\n", " )\n", " tuner.fit()\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e24c05fa", "metadata": {}, "source": [ "To use the `setup_wandb` utility, you simply call this function in your objective.\n", "Note that we also use `wandb.log(...)` to log the `loss` to Weights & Biases as a dictionary.\n", "Otherwise, this version of our objective is identical to its original." ] }, { "cell_type": "code", "execution_count": 4, "id": "5e30d5e7", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "def train_function_wandb(config):\n", " wandb = setup_wandb(config, project=\"Wandb_example\")\n", "\n", " for i in range(30):\n", " loss = config[\"mean\"] + config[\"sd\"] * np.random.randn()\n", " train.report({\"loss\": loss})\n", " wandb.log(dict(loss=loss))\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "04040bcb", "metadata": {}, "source": [ "With the `train_function_wandb` defined, your Tune experiment will set up `wandb` in each trial once it starts!" ] }, { "cell_type": "code", "execution_count": 5, "id": "d4fbd368", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "def tune_with_setup():\n", " \"\"\"Example for using the setup_wandb utility with the function API\"\"\"\n", " tuner = tune.Tuner(\n", " train_function_wandb,\n", " tune_config=tune.TuneConfig(\n", " metric=\"loss\",\n", " mode=\"min\",\n", " ),\n", " param_space={\n", " \"mean\": tune.grid_search([1, 2, 3, 4, 5]),\n", " \"sd\": tune.uniform(0.2, 0.8),\n", " },\n", " )\n", " tuner.fit()\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f9521481", "metadata": {}, "source": [ "Finally, you can also define a class-based Tune `Trainable` by using the `setup_wandb` in the `setup()` method and storing the run object as an attribute. Please note that with the class trainable, you have to pass the trial id, name, and group separately:" ] }, { "cell_type": "code", "execution_count": 6, "id": "d27a7a35", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "class WandbTrainable(tune.Trainable):\n", " def setup(self, config):\n", " self.wandb = setup_wandb(\n", " config,\n", " trial_id=self.trial_id,\n", " trial_name=self.trial_name,\n", " group=\"Example\",\n", " project=\"Wandb_example\",\n", " )\n", "\n", " def step(self):\n", " for i in range(30):\n", " loss = self.config[\"mean\"] + self.config[\"sd\"] * np.random.randn()\n", " self.wandb.log({\"loss\": loss})\n", " return {\"loss\": loss, \"done\": True}\n", "\n", " def save_checkpoint(self, checkpoint_dir: str):\n", " pass\n", "\n", " def load_checkpoint(self, checkpoint_dir: str):\n", " pass\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "fa189bb2", "metadata": {}, "source": [ "Running Tune with this `WandbTrainable` works exactly the same as with the function API.\n", "The below `tune_trainable` function differs from `tune_decorated` above only in the first argument we pass to\n", "`Tuner()`:" ] }, { "cell_type": "code", "execution_count": 7, "id": "6e546cc2", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "def tune_trainable():\n", " \"\"\"Example for using a WandTrainableMixin with the class API\"\"\"\n", " tuner = tune.Tuner(\n", " WandbTrainable,\n", " tune_config=tune.TuneConfig(\n", " metric=\"loss\",\n", " mode=\"min\",\n", " ),\n", " param_space={\n", " \"mean\": tune.grid_search([1, 2, 3, 4, 5]),\n", " \"sd\": tune.uniform(0.2, 0.8),\n", " },\n", " )\n", " results = tuner.fit()\n", "\n", " return results.get_best_result().config\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "0b736172", "metadata": {}, "source": [ "Since you may not have an API key for Wandb, we can _mock_ the Wandb logger and test all three of our training\n", "functions as follows.\n", "If you are logged in into wandb, you can set `mock_api = False` to actually upload your results to Weights & Biases." ] }, { "cell_type": "code", "execution_count": 8, "id": "e0e7f481", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-11-02 16:02:45,355\tINFO worker.py:1534 -- Started a local Ray instance. View the dashboard at \u001b[1m\u001b[32mhttp://127.0.0.1:8266 \u001b[39m\u001b[22m\n", "2022-11-02 16:02:46,513\tINFO wandb.py:282 -- Already logged into W&B.\n" ] }, { "data": { "text/html": [ "
\n", "
\n", "
\n", "

Tune Status

\n", " \n", "\n", "\n", "\n", "\n", "\n", "
Current time:2022-11-02 16:03:13
Running for: 00:00:27.28
Memory: 10.8/16.0 GiB
\n", "
\n", "
\n", "
\n", "

System Info

\n", " Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/3.44 GiB heap, 0.0/1.72 GiB objects\n", "
\n", " \n", "
\n", "
\n", "
\n", "

Trial Status

\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Trial name status loc mean sd iter total time (s) loss
train_function_7676d_00000TERMINATED127.0.0.1:14578 10.411212 30 0.2361370.828527
train_function_7676d_00001TERMINATED127.0.0.1:14591 20.756339 30 5.57185 3.13156
train_function_7676d_00002TERMINATED127.0.0.1:14593 30.436643 30 5.50237 3.26679
train_function_7676d_00003TERMINATED127.0.0.1:14595 40.295929 30 5.60986 3.70388
train_function_7676d_00004TERMINATED127.0.0.1:14596 50.335292 30 5.61385 4.74294
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "

Trial Progress

\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Trial name date done episodes_total experiment_id experiment_tag hostname iterations_since_restore lossnode_ip pid time_since_restore time_this_iter_s time_total_s timestamp timesteps_since_restoretimesteps_total training_iterationtrial_id warmup_time
train_function_7676d_000002022-11-02_16-02-53True a9f242fa70184d9dadd8952b16fb0ecc0_mean=1,sd=0.4112Kais-MBP.local.meter 300.828527127.0.0.114578 0.236137 0.00381589 0.236137 1667430173 0 307676d_00000 0.00366998
train_function_7676d_000012022-11-02_16-03-03True f57118365bcb4c229fe41c5911f05ad61_mean=2,sd=0.7563Kais-MBP.local.meter 303.13156 127.0.0.114591 5.57185 0.00627518 5.57185 1667430183 0 307676d_00001 0.0027349
train_function_7676d_000022022-11-02_16-03-03True 394021d4515d4616bae7126668f73b2b2_mean=3,sd=0.4366Kais-MBP.local.meter 303.26679 127.0.0.114593 5.50237 0.00494576 5.50237 1667430183 0 307676d_00002 0.00286222
train_function_7676d_000032022-11-02_16-03-03True a575e79c9d95485fa37deaa86267aea43_mean=4,sd=0.2959Kais-MBP.local.meter 303.70388 127.0.0.114595 5.60986 0.00689816 5.60986 1667430183 0 307676d_00003 0.00299597
train_function_7676d_000042022-11-02_16-03-03True 91ce57dcdbb54536b1874666b711350d4_mean=5,sd=0.3353Kais-MBP.local.meter 304.74294 127.0.0.114596 5.61385 0.00672579 5.61385 1667430183 0 307676d_00004 0.00323987
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-02 16:03:13,913\tINFO tune.py:788 -- Total run time: 28.53 seconds (27.28 seconds for the tuning loop).\n" ] }, { "data": { "text/html": [ "
\n", "
\n", "
\n", "

Tune Status

\n", " \n", "\n", "\n", "\n", "\n", "\n", "
Current time:2022-11-02 16:03:22
Running for: 00:00:08.49
Memory: 9.9/16.0 GiB
\n", "
\n", "
\n", "
\n", "

System Info

\n", " Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/3.44 GiB heap, 0.0/1.72 GiB objects\n", "
\n", " \n", "
\n", "
\n", "
\n", "

Trial Status

\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Trial name status loc mean sd iter total time (s) loss
train_function_wandb_877eb_00000TERMINATED127.0.0.1:14647 10.738281 30 1.613190.555153
train_function_wandb_877eb_00001TERMINATED127.0.0.1:14660 20.321178 30 1.724472.52109
train_function_wandb_877eb_00002TERMINATED127.0.0.1:14661 30.202487 30 1.8159 2.45412
train_function_wandb_877eb_00003TERMINATED127.0.0.1:14662 40.515434 30 1.715 4.51413
train_function_wandb_877eb_00004TERMINATED127.0.0.1:14663 50.216098 30 1.728275.2814
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train_function_wandb pid=14647)\u001b[0m 2022-11-02 16:03:17,149\tINFO wandb.py:282 -- Already logged into W&B.\n" ] }, { "data": { "text/html": [ "
\n", "

Trial Progress

\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Trial name date done episodes_total experiment_id experiment_tag hostname iterations_since_restore lossnode_ip pid time_since_restore time_this_iter_s time_total_s timestamp timesteps_since_restoretimesteps_total training_iterationtrial_id warmup_time
train_function_wandb_877eb_000002022-11-02_16-03-18True 7b250c9f31ab484dad1a1fd29823afdf0_mean=1,sd=0.7383Kais-MBP.local.meter 300.555153127.0.0.114647 1.61319 0.00232315 1.61319 1667430198 0 30877eb_00000 0.00391102
train_function_wandb_877eb_000012022-11-02_16-03-22True 5172868368074557a3044ea3a91466731_mean=2,sd=0.3212Kais-MBP.local.meter 302.52109 127.0.0.114660 1.72447 0.0152011 1.72447 1667430202 0 30877eb_00001 0.00901699
train_function_wandb_877eb_000022022-11-02_16-03-22True b13d9bccb1964b4b95e1a858a3ea64c72_mean=3,sd=0.2025Kais-MBP.local.meter 302.45412 127.0.0.114661 1.8159 0.00437403 1.8159 1667430202 0 30877eb_00002 0.00844812
train_function_wandb_877eb_000032022-11-02_16-03-22True 869d7ec7a3544a8387985103e626818f3_mean=4,sd=0.5154Kais-MBP.local.meter 304.51413 127.0.0.114662 1.715 0.00247812 1.715 1667430202 0 30877eb_00003 0.00282907
train_function_wandb_877eb_000042022-11-02_16-03-22True 84d3112d66f64325bc469e44b8447ef54_mean=5,sd=0.2161Kais-MBP.local.meter 305.2814 127.0.0.114663 1.72827 0.00517201 1.72827 1667430202 0 30877eb_00004 0.00272107
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train_function_wandb pid=14660)\u001b[0m 2022-11-02 16:03:20,600\tINFO wandb.py:282 -- Already logged into W&B.\n", "\u001b[2m\u001b[36m(train_function_wandb pid=14661)\u001b[0m 2022-11-02 16:03:20,600\tINFO wandb.py:282 -- Already logged into W&B.\n", "\u001b[2m\u001b[36m(train_function_wandb pid=14663)\u001b[0m 2022-11-02 16:03:20,628\tINFO wandb.py:282 -- Already logged into W&B.\n", "\u001b[2m\u001b[36m(train_function_wandb pid=14662)\u001b[0m 2022-11-02 16:03:20,723\tINFO wandb.py:282 -- Already logged into W&B.\n", "2022-11-02 16:03:22,565\tINFO tune.py:788 -- Total run time: 8.60 seconds (8.48 seconds for the tuning loop).\n" ] }, { "data": { "text/html": [ "
\n", "
\n", "
\n", "

Tune Status

\n", " \n", "\n", "\n", "\n", "\n", "\n", "
Current time:2022-11-02 16:03:31
Running for: 00:00:09.28
Memory: 9.9/16.0 GiB
\n", "
\n", "
\n", "
\n", "

System Info

\n", " Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/3.44 GiB heap, 0.0/1.72 GiB objects\n", "
\n", " \n", "
\n", "
\n", "
\n", "

Trial Status

\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Trial name status loc mean sd iter total time (s) loss
WandbTrainable_8ca33_00000TERMINATED127.0.0.1:14718 10.397894 1 0.0001871590.742345
WandbTrainable_8ca33_00001TERMINATED127.0.0.1:14737 20.386883 1 0.0001518732.5709
WandbTrainable_8ca33_00002TERMINATED127.0.0.1:14738 30.290693 1 0.00014019 2.99601
WandbTrainable_8ca33_00003TERMINATED127.0.0.1:14739 40.33333 1 0.00015831 3.91276
WandbTrainable_8ca33_00004TERMINATED127.0.0.1:14740 50.645479 1 0.0001509195.47779
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(WandbTrainable pid=14718)\u001b[0m 2022-11-02 16:03:25,742\tINFO wandb.py:282 -- Already logged into W&B.\n" ] }, { "data": { "text/html": [ "
\n", "

Trial Progress

\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Trial name date done episodes_total experiment_id hostname iterations_since_restore lossnode_ip pid time_since_restore time_this_iter_s time_total_s timestamp timesteps_since_restoretimesteps_total training_iterationtrial_id warmup_time
WandbTrainable_8ca33_000002022-11-02_16-03-27True 3adb4d0ae0d74d1c9ddd07924b5653b0Kais-MBP.local.meter 10.742345127.0.0.114718 0.000187159 0.000187159 0.000187159 1667430207 0 18ca33_00000 1.31382
WandbTrainable_8ca33_000012022-11-02_16-03-31True f1511cfd51f94b3d9cf192181ccc08a9Kais-MBP.local.meter 12.5709 127.0.0.114737 0.000151873 0.000151873 0.000151873 1667430211 0 18ca33_00001 1.31668
WandbTrainable_8ca33_000022022-11-02_16-03-31True a7528ec6adf74de0b73aa98ebedab66dKais-MBP.local.meter 12.99601 127.0.0.114738 0.00014019 0.00014019 0.00014019 1667430211 0 18ca33_00002 1.32008
WandbTrainable_8ca33_000032022-11-02_16-03-31True b7af756ca586449ba2d4c44141b53b06Kais-MBP.local.meter 13.91276 127.0.0.114739 0.00015831 0.00015831 0.00015831 1667430211 0 18ca33_00003 1.31879
WandbTrainable_8ca33_000042022-11-02_16-03-31True 196624f42bcc45c18a26778573a43a2cKais-MBP.local.meter 15.47779 127.0.0.114740 0.000150919 0.000150919 0.000150919 1667430211 0 18ca33_00004 1.31945
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(WandbTrainable pid=14739)\u001b[0m 2022-11-02 16:03:30,360\tINFO wandb.py:282 -- Already logged into W&B.\n", "\u001b[2m\u001b[36m(WandbTrainable pid=14740)\u001b[0m 2022-11-02 16:03:30,393\tINFO wandb.py:282 -- Already logged into W&B.\n", "\u001b[2m\u001b[36m(WandbTrainable pid=14737)\u001b[0m 2022-11-02 16:03:30,454\tINFO wandb.py:282 -- Already logged into W&B.\n", "\u001b[2m\u001b[36m(WandbTrainable pid=14738)\u001b[0m 2022-11-02 16:03:30,510\tINFO wandb.py:282 -- Already logged into W&B.\n", "2022-11-02 16:03:31,985\tINFO tune.py:788 -- Total run time: 9.40 seconds (9.27 seconds for the tuning loop).\n" ] }, { "data": { "text/plain": [ "{'mean': 1, 'sd': 0.3978937765393781, 'wandb': {'project': 'Wandb_example'}}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os\n", "\n", "mock_api = True\n", "\n", "if mock_api:\n", " os.environ.setdefault(\"WANDB_MODE\", \"disabled\")\n", " os.environ.setdefault(\"WANDB_API_KEY\", \"abcd\")\n", " ray.init(\n", " runtime_env={\"env_vars\": {\"WANDB_MODE\": \"disabled\", \"WANDB_API_KEY\": \"abcd\"}}\n", " )\n", "\n", "tune_with_callback()\n", "tune_with_setup()\n", "tune_trainable()\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2f6e9138", "metadata": {}, "source": [ "This completes our Tune and Wandb walk-through.\n", "In the following sections you can find more details on the API of the Tune-Wandb integration.\n", "\n", "## Tune Wandb API Reference\n", "\n", "### WandbLoggerCallback\n", "\n", "(air-wandb-logger)=\n", "\n", "```{eval-rst}\n", ".. autoclass:: ray.air.integrations.wandb.WandbLoggerCallback\n", " :noindex:\n", "```\n", "\n", "### setup_wandb\n", "\n", "(air-wandb-setup)=\n", "\n", "```{eval-rst}\n", ".. autofunction:: ray.air.integrations.wandb.setup_wandb\n", " :noindex:\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" }, "orphan": true }, "nbformat": 4, "nbformat_minor": 5 }