Ray Dashboard#

Ray provides a web-based dashboard for monitoring and debugging Ray applications. The dashboard provides a visual representation of the system state, allowing users to track the performance of their applications and troubleshoot issues.


Getting Started#

To use the dashboard, first install Ray with the proper dependencies:

pip install -U "ray[default]"

You can access the dashboard through a URL printed when Ray is initialized (the default URL is http://localhost:8265).

If you prefer to explicitly set the port on which the dashboard will run, you can pass the --dashboard-port argument with ray start in the command line, or you can pass the keyword argument dashboard_port in your call to ray.init().

INFO worker.py:1487 -- Connected to Ray cluster. View the dashboard at

The dashboard is also available when using the cluster launcher.


There are 4 different views that the Dashboard provides: Node view, Jobs view, Actors view, and Logs view.


Node View#

The Node view lets you see resource utilization information on a per-node and per-worker basis. This also shows the assignment of GPU resources to specific actors or tasks.

In addition, the machine view lets you see logs for a node or a worker.

Finally, you can see the task that each worker is currently performing.


Jobs View#

The Jobs view lets you monitor the different jobs that ran on your Ray cluster. A job is a ray workload that was initiated by an entry point. Typically, jobs are initiated via directly calling ray.init or a Ray library from a python script or by using the job submission api.


Actors View#

The Actors view lets you see information about the actors that have existed on the ray cluster.

You can view the logs for an actor and you can see which job created the actor. The information of up to 1000 dead actors will be stored. This value can be overridden by using the RAY_DASHBOARD_MAX_ACTORS_TO_CACHE environment variable when starting Ray.


Logs view#

The logs view lets you view all the ray logs that are in your cluster. It is organized by node and log file name. Many log links in the other pages will link to this view and filter the list so the relevant logs appear.


The log viewer provides various search functionality to help find the log messages you are looking for.


Metrics View#

The metrics view lets you view visualizations of the time series metrics emitted by Ray. It requires that prometheus and grafana is running for your cluster. Instructions about that can be found here.

You can select the time range of the metrics in the top right corner. The graphs refresh automatically every 15 seconds.

There is also a convenient button to open the grafana UI from the dashboard. The Grafana UI provides additional customizability of the charts.


Event View#


The event view feature is experimental.

The event view lets you see a list of events associated with a specific type (e.g., autoscaler or job) in a chronological order. The equivalent information is also accessible via CLI commands ray list cluster-events (Ray state APIs).

There are 2 types of events that are available.


Advanced Usage#

Viewing built-in dashboard API metrics#

The dashboard is powered by a server that serves both the UI code and the data about the cluster via API endpoints. There are basic prometheus metrics that are emitted for each of these API endpoints:

ray_dashboard_api_requests_count_requests_total: Collects the total count of requests. This is tagged by endpoint, method, and http_status.

ray_dashboard_api_requests_duration_seconds_bucket: Collects the duration of requests. This is tagged by endpoint and method.

For example, you can view the p95 duration of all requests with this query:

histogram_quantile(0.95, sum(rate(ray_dashboard_api_requests_duration_seconds_bucket[5m])) by (le))

These metrics can be queried via prometheus or grafana UI. Instructions on how to set these tools up can be found here.

Debugging ObjectStoreFullError and Memory Leaks#

You can view information for object store usage in the Nodes view. Use it to debug memory leaks, especially ObjectStoreFullError.

One common cause of these memory errors is that there are objects which never go out of scope. In order to find these, you can use the ray memory command.

For details about the information contained in the table, please see the ray memory command documentation.

Running Behind a Reverse Proxy#

The dashboard should work out-of-the-box when accessed via a reverse proxy. API requests don’t need to be proxied individually.

Always access the dashboard with a trailing / at the end of the URL. For example, if your proxy is set up to handle requests to /ray/dashboard, view the dashboard at www.my-website.com/ray/dashboard/.

The dashboard now sends HTTP requests with relative URL paths. Browsers will handle these requests as expected when the window.location.href ends in a trailing /.

This is a peculiarity of how many browsers handle requests with relative URLs, despite what MDN defines as the expected behavior.

Make your dashboard visible without a trailing / by including a rule in your reverse proxy that redirects the user’s browser to /, i.e. /ray/dashboard –> /ray/dashboard/.

Below is an example with a traefik TOML file that accomplishes this:

      rule = "PathPrefix(`/ray/dashboard`)"
      middlewares = ["test-redirectregex", "strip"]
      service = "dashboard"
      regex = "^(.*)/ray/dashboard$"
      replacement = "${1}/ray/dashboard/"
      prefixes = ["/ray/dashboard"]
        url = "http://localhost:8265"


Node View#

Node/Worker Hierarchy: The dashboard visualizes hierarchical relationship of machines (nodes) and workers (processes). Each host consists of many workers, and you can see them by clicking the + button. The first node is always expanded by default.


You can hide it again by clicking the - button.


You can also click the node id to go into a node detail page where you can see more information.

Node View Reference#




Whether the node or worker is alive or dead.


The ID of the node or the workerId for the worker.

Host / Cmd line

If it is a node, it shows host information. If it is a worker, it shows the name of the task that is being run.


If it is a node, it shows the IP address of the node. If it’s a worker, it shows the PID of the worker process.

CPU Usage

CPU usage of each node and worker.


RAM usage of each node and worker.


GPU usage of the node.


GPU memory usage of the node.

Object Store Memory

Amount of memory used by the object store for this node.


Disk usage of the node.


Network bytes sent for each node and worker.


Network bytes received for each node and worker.


Logs messages at each node and worker. You can see log files relevant to a node or worker by clicking this link.

Stack Trace

Get the Python stack trace for the specified worker. Refer to Python CPU Profiling in the Dashboard for more information.

CPU Flame Graph

Get a CPU flame graph for the specified worker. Refer to Python CPU Profiling in the Dashboard for more information.

Jobs View#

Jobs View Reference#



Job ID

The ID of the job. This is the primary id that associates tasks and actors to this job.

Submission ID

An alternate ID that can be provided by a user or generated for all ray job submissions. It’s useful if you would like to associate your job with an ID that is provided by some external system.


Describes the state of a job. One of:
  • PENDING: The job has not started yet, likely waiting for the runtime_env to be set up.

  • RUNNING: The job is currently running.

  • STOPPED: The job was intentionally stopped by the user.

  • SUCCEEDED: The job finished successfully.

  • FAILED: The job failed.


A link to the logs for this job.


The time the job was started.


The time the job finished.


The PID for the driver process that is started the job.


Actor View Reference#



Actor ID

The ID of the actor.

Restart Times

Number of times this actor has been restarted.


The name of an actor. This can be user defined.


The class of the actor.


The current function the actor is running.

Job ID

The job in which this actor was created.


ID of the worker process on which the actor is running.


Node IP Address where the actor is located.


The Port for the actor.


Either one of “ALIVE” or “DEAD”.


A link to the logs that are relevant to this actor.

Stack Trace

Get the Python stack trace for the specified actor. Refer to Python CPU Profiling in the Dashboard for more information.

CPU Flame Graph

Get a CPU flame graph for the specified actor. Refer to Python CPU Profiling in the Dashboard for more information.


The top level page for the logs view shows the list of nodes in the cluster. After clicking into a node, you can now see a list of all log files.

Details of the different log files can be found here: Logging.