Debugging Hangs#

View stack traces in Ray Dashboard#

The Ray dashboard lets you profile Ray Driver or Worker processes, by clicking on the “CPU profiling” or “Stack Trace” actions for active Worker processes, Tasks, Actors, and Job’s driver process.

../../../_images/profile.png

Clicking “Stack Trace” will return the current stack trace sample using py-spy. By default, only the Python stack trace is shown. To show native code frames, set the URL parameter native=1 (only supported on Linux).

../../../_images/stack.png

Note

You may run into permission errors when using py-spy in the docker containers. To fix the issue:

Note

The following errors are conditional and not signals of failures for your Python programs:

  • If you see “No such file or direction”, check if your worker process has exited.

  • If you see “No stack counts found”, check if your worker process was sleeping and not active in the last 5s.

Use ray stack CLI command#

Once py-spy is installed (it is automatically installed if “Ray Dashboard” component is included when installing Ray), you can run ray stack to dump the stack traces of all Ray Worker processes on the current node.

This document discusses some common problems that people run into when using Ray as well as some known problems. If you encounter other problems, please let us know.