State CLI#
State#
This section contains commands to access the live state of Ray resources (actor, task, object, etc.).
Note
APIs are alpha. This feature requires a full installation of Ray using pip install "ray[default]"
. This feature also requires the dashboard component to be available. The dashboard component needs to be included when starting the ray cluster, which is the default behavior for ray start
and ray.init()
. For more in-depth debugging, you could check the dashboard log at <RAY_LOG_DIR>/dashboard.log
, which is usually /tmp/ray/session_latest/logs/dashboard.log
.
State CLI allows users to access the state of various resources (e.g., actor, task, object).
ray summary tasks#
Summarize the task state of the cluster.
By default, the output contains the information grouped by task function names.
The output schema is
TaskSummaries
.
- Raises:
RayStateApiException
if the CLI is failed to query the data.
ray summary tasks [OPTIONS]
Options
- --timeout <timeout>#
Timeout in seconds for the API requests. Default is 30
- --address <address>#
The address of Ray API server. If not provided, it will be configured automatically from querying the GCS server.
ray summary actors#
Summarize the actor state of the cluster.
By default, the output contains the information grouped by actor class names.
The output schema is
ray.util.state.common.ActorSummaries
.
- Raises:
RayStateApiException
if the CLI is failed to query the data.
ray summary actors [OPTIONS]
Options
- --timeout <timeout>#
Timeout in seconds for the API requests. Default is 30
- --address <address>#
The address of Ray API server. If not provided, it will be configured automatically from querying the GCS server.
ray summary objects#
Summarize the object state of the cluster.
The API is recommended when debugging memory leaks.
See Debugging with Ray Memory for more details.
(Note that this command is almost equivalent to ray memory
, but it returns
easier-to-understand output).
By default, the output contains the information grouped by
object callsite. Note that the callsite is not collected and
all data will be aggregated as “disable” callsite if the env var
RAY_record_ref_creation_sites
is not configured. To enable the
callsite collection, set the following environment variable when
starting Ray.
Example:
` RAY_record_ref_creation_sites=1 ray start --head `
` RAY_record_ref_creation_sites=1 ray_script.py `
The output schema is
ray.util.state.common.ObjectSummaries
.
- Raises:
RayStateApiException
if the CLI is failed to query the data.
ray summary objects [OPTIONS]
Options
- --timeout <timeout>#
Timeout in seconds for the API requests. Default is 30
- --address <address>#
The address of Ray API server. If not provided, it will be configured automatically from querying the GCS server.
ray list#
List all states of a given resource.
Normally, summary APIs are recommended before listing all resources.
The output schema is defined at State API Schema section.
For example, the output schema of ray list tasks
is
TaskState
.
Usage:
List all actor information from the cluster.
` ray list actors `
List 50 actors from the cluster. The sorting order cannot be controlled.
` ray list actors --limit 50 `
List 10 actors with state PENDING.
` ray list actors --limit 10 --filter "state=PENDING" `
List actors with yaml format.
` ray list actors --format yaml `
List actors with details. When –detail is specified, it might query more data sources to obtain data in details.
` ray list actors --detail `
The API queries one or more components from the cluster to obtain the data. The returned state snapshot could be stale, and it is not guaranteed to return the live data.
The API can return partial or missing output upon the following scenarios.
When the API queries more than 1 component, if some of them fail, the API will return the partial result (with a suppressible warning).
When the API returns too many entries, the API will truncate the output. Currently, truncated data cannot be selected by users.
- Args:
resource: The type of the resource to query.
- Raises:
RayStateApiException
if the CLI is failed to query the data.
- Changes:
changed in version 2.7: –filter values are case-insensitive.
ray list [OPTIONS] {actors|jobs|placement-
groups|nodes|workers|tasks|objects|runtime-envs|cluster-events}
Options
- --format <format>#
- Options:
default | json | yaml | table
- -f, --filter <filter>#
A key, predicate, and value to filter the result. E.g., –filter ‘key=value’ or –filter ‘key!=value’. You can specify multiple –filter options. In this case all predicates are concatenated as AND. For example, –filter key=value –filter key2=value means (key==val) AND (key2==val2), String filter values are case-insensitive.
- --limit <limit>#
Maximum number of entries to return. 100 by default.
- --detail#
If the flag is set, the output will contain data in more details. Note that the API could query more sources to obtain information in a greater detail.
- --timeout <timeout>#
Timeout in seconds for the API requests. Default is 30
- --address <address>#
The address of Ray API server. If not provided, it will be configured automatically from querying the GCS server.
Arguments
- RESOURCE#
Required argument
ray get#
Get a state of a given resource by ID.
We currently DO NOT support get by id for jobs and runtime-envs
The output schema is defined at State API Schema section.
For example, the output schema of ray get tasks <task-id>
is
TaskState
.
Usage:
Get an actor with actor id <actor-id>
` ray get actors <actor-id> `
Get a placement group information with <placement-group-id>
` ray get placement-groups <placement-group-id> `
The API queries one or more components from the cluster to obtain the data. The returned state snapshot could be stale, and it is not guaranteed to return the live data.
- Args:
resource: The type of the resource to query. id: The id of the resource.
- Raises:
RayStateApiException
if the CLI is failed to query the data.
ray get [OPTIONS] {actors|placement-
groups|nodes|workers|tasks|objects|cluster-events} [ID]
Options
- --address <address>#
The address of Ray API server. If not provided, it will be configured automatically from querying the GCS server.
- --timeout <timeout>#
Timeout in seconds for the API requests. Default is 30
Arguments
- RESOURCE#
Required argument
- ID#
Optional argument
Log#
This section contains commands to access logs from Ray clusters.
Note
APIs are alpha. This feature requires a full installation of Ray using pip install "ray[default]"
.
Log CLI allows users to access the log from the cluster. Note that only the logs from alive nodes are available through this API.
ray logs#
Get logs based on filename (cluster) or resource identifiers (actor)
Example:
Get all the log files available on a node (ray address could be obtained from
ray start --head
orray.init()
).
` ray logs cluster `
[ray logs cluster] Print the last 500 lines of raylet.out on a head node.
` ray logs cluster raylet.out --tail 500 `
Or simply, using
ray logs
as an alias forray logs cluster
:
` ray logs raylet.out --tail 500 `
Print the last 500 lines of raylet.out on a worker node id A.
` ray logs raylet.out --tail 500 —-node-id A `
[ray logs actor] Follow the log file with an actor id ABC.
` ray logs actor --id ABC --follow `
[ray logs task] Get the std err generated by a task.
Note: If a task is from a concurrent actor (i.e. an async actor or a threaded actor), the log of the tasks are expected to be interleaved. Please use
ray logs actor --id <actor_id>
for the entire actor log.
` ray logs task --id <TASK_ID> --err `
ray logs [OPTIONS] COMMAND [ARGS]...
Commands
- actor
Get/List logs associated with an actor.
- cluster
Get/List logs that matches the GLOB_FILTER…
- job
Get logs associated with a submission job.
- task
Get logs associated with a task.
- worker
Get logs associated with a worker process.