Profiling for Ray Developers
Contents
Profiling for Ray Developers#
This guide helps contributors to the Ray project analyze Ray performance.
Getting a stack trace of Ray C++ processes#
You can use the following GDB command to view the current stack trace of any running Ray process (e.g., raylet). This can be useful for debugging 100% CPU utilization or infinite loops (simply run the command a few times to see what the process is stuck on).
sudo gdb -batch -ex "thread apply all bt" -p <pid>
Note that you can find the pid of the raylet with pgrep raylet
.
Installation#
These instructions are for Ubuntu only. Attempts to get pprof
to correctly
symbolize on Mac OS have failed.
sudo apt-get install google-perftools libgoogle-perftools-dev
Launching the to-profile binary#
If you want to launch Ray in profiling mode, define the following variables:
export PERFTOOLS_PATH=/usr/lib/x86_64-linux-gnu/libprofiler.so
export PERFTOOLS_LOGFILE=/tmp/pprof.out
The file /tmp/pprof.out
will be empty until you let the binary run the
target workload for a while and then kill
it via ray stop
or by
letting the driver exit.
Memory Profiling#
If you want to run memory profiling on Ray core components, you can use Jemalloc (https://github.com/jemalloc/jemalloc). Ray supports environment variables to override LD_PRELOAD on core components.
You can find the component name from ray_constants.py
. For example, if you’d like to profile gcs_server,
search PROCESS_TYPE_GCS_SERVER
in ray_constants.py
. You can see the value is gcs_server
.
Users are supposed to provide 3 env vars for memory profiling.
RAY_JEMALLOC_LIB_PATH: The path to the jemalloc shared library
so
.RAY_JEMALLOC_CONF: The MALLOC_CONF of jemalloc (comma separated).
RAY_JEMALLOC_PROFILE: Comma separated Ray components to run Jemalloc
so
. e.g., (“raylet,gcs_server”). Note that the components should match the process type inray_constants.py
. (It means “RAYLET,GCS_SERVER” won’t work).
# Install jemalloc
wget https://github.com/jemalloc/jemalloc/releases/download/5.2.1/jemalloc-5.2.1.tar.bz2
tar -xf jemalloc-5.2.1.tar.bz2
cd jemalloc-5.2.1
./configure --enable-prof --enable-prof-libunwind
make
# set jemalloc configs through MALLOC_CONF env variable
# read http://jemalloc.net/jemalloc.3.html#opt.lg_prof_interval
# for all jemalloc configs
# Ray start will profile the GCS server component.
RAY_JEMALLOC_CONF=prof:true,lg_prof_interval:33,lg_prof_sample:17,prof_final:true,prof_leak:true \
RAY_JEMALLOC_LIB_PATH=~/jemalloc-5.2.1/lib/libjemalloc.so \
RAY_JEMALLOC_PROFILE=gcs_server \
ray start --head
# You should be able to see the following logs.
2021-10-20 19:45:08,175 INFO services.py:622 -- Jemalloc profiling will be used for gcs_server. env vars: {'LD_PRELOAD': '/Users/sangbincho/jemalloc-5.2.1/lib/libjemalloc.so', 'MALLOC_CONF': 'prof:true,lg_prof_interval:33,lg_prof_sample:17,prof_final:true,prof_leak:true'}
Visualizing the CPU profile#
The output of pprof
can be visualized in many ways. Here we output it as a
zoomable .svg
image displaying the call graph annotated with hot paths.
# Use the appropriate path.
RAYLET=ray/python/ray/core/src/ray/raylet/raylet
google-pprof -svg $RAYLET /tmp/pprof.out > /tmp/pprof.svg
# Then open the .svg file with Chrome.
# If you realize the call graph is too large, use -focus=<some function> to zoom
# into subtrees.
google-pprof -focus=epoll_wait -svg $RAYLET /tmp/pprof.out > /tmp/pprof.svg
Here’s a snapshot of an example svg output, taken from the official documentation:

Running Microbenchmarks#
To run a set of single-node Ray microbenchmarks, use:
ray microbenchmark
You can find the microbenchmark results for Ray releases in the GitHub release logs.
References#
The pprof documentation.
The gperftools, including libprofiler, tcmalloc, and other goodies.