Ray State API

Note

APIs are alpha. This feature requires a full installation of Ray using pip install "ray[default]".

State CLI

State CLI allows users to access the state of various resources (e.g., actor, task, object).

ray summary tasks

Summarize the task state of the cluster.

By default, the output contains the information grouped by task function names.

The output schema is ray.experimental.state.common.TaskSummaries.

Raises:
RayStateApiException

if the CLI is failed to query the data.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

ray summary tasks [OPTIONS]

Options

--timeout <timeout>

Timeout in seconds for the API requests. Default is 30

--address <address>

The address of Ray API server. If not provided, it will be configured automatically from querying the GCS server.

ray summary actors

Summarize the actor state of the cluster.

By default, the output contains the information grouped by actor class names.

The output schema is ray.experimental.state.common.ActorSummaries.

Raises:
RayStateApiException

if the CLI is failed to query the data.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

ray summary actors [OPTIONS]

Options

--timeout <timeout>

Timeout in seconds for the API requests. Default is 30

--address <address>

The address of Ray API server. If not provided, it will be configured automatically from querying the GCS server.

ray summary objects

Summarize the object state of the cluster.

The API is recommended when debugging memory leaks. See Debugging with Ray Memory for more details. (Note that this command is almost equivalent to ray memory, but it returns easier-to-understand output).

By default, the output contains the information grouped by object callsite. Note that the callsite is not collected and all data will be aggregated as “disable” callsite if the env var RAY_record_ref_creation_sites is not configured. To enable the callsite collection, set the following environment variable when starting Ray.

Example:

` RAY_record_ref_creation_sites=1 ray start --head `

` RAY_record_ref_creation_sites=1 ray_script.py `

The output schema is ray.experimental.state.common.ObjectSummaries.

Raises:
RayStateApiException

if the CLI is failed to query the data.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

ray summary objects [OPTIONS]

Options

--timeout <timeout>

Timeout in seconds for the API requests. Default is 30

--address <address>

The address of Ray API server. If not provided, it will be configured automatically from querying the GCS server.

ray list

List all states of a given resource.

Normally, summary APIs are recommended before listing all resources.

The output schema is defined at State API Schema section.

For example, the output schema of ray list tasks is ray.experimental.state.common.TaskState.

Usage:

List all actor information from the cluster.

` ray list actors `

List 50 actors from the cluster. The sorting order cannot be controlled.

` ray list actors --limit 50 `

List 10 actors with state PENDING.

` ray list actors --limit 10 --filter "state=PENDING" `

List actors with yaml format.

` ray list actors --format yaml `

List actors with details. When –detail is specified, it might query more data sources to obtain data in details.

` ray list actors --detail `

The API queries one or more components from the cluster to obtain the data. The returned state snapshot could be stale, and it is not guaranteed to return the live data.

The API can return partial or missing output upon the following scenarios.

  • When the API queries more than 1 component, if some of them fail, the API will return the partial result (with a suppressible warning).

  • When the API returns too many entries, the API will truncate the output. Currently, truncated data cannot be selected by users.

Args:

resource: The type of the resource to query.

Raises:
RayStateApiException

if the CLI is failed to query the data.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

ray list [OPTIONS] [actors|jobs|placement-
         groups|nodes|workers|tasks|objects|runtime-envs]

Options

--format <format>
Options

default | json | yaml | table

-f, --filter <filter>

A key, predicate, and value to filter the result. E.g., –filter ‘key=value’ or –filter ‘key!=value’. You can specify multiple –filter options. In this case all predicates are concatenated as AND. For example, –filter key=value –filter key2=value means (key==val) AND (key2==val2)

--limit <limit>

Maximum number of entries to return. 100 by default.

--detail

If the flag is set, the output will contain data in more details. Note that the API could query more sources to obtain information in a greater detail.

--timeout <timeout>

Timeout in seconds for the API requests. Default is 30

--address <address>

The address of Ray API server. If not provided, it will be configured automatically from querying the GCS server.

Arguments

RESOURCE

Required argument

ray get

Get a state of a given resource by ID.

We currently DO NOT support get by id for jobs and runtime-envs

The output schema is defined at State API Schema section.

For example, the output schema of ray get tasks is ray.experimental.state.common.TaskState.

Usage:

Get an actor with actor id <actor-id>

` ray get actors <actor-id> `

Get a placement group information with <placement-group-id>

` ray get placement-groups <placement-group-id> `

The API queries one or more components from the cluster to obtain the data. The returned state snapshot could be stale, and it is not guaranteed to return the live data.

Args:

resource: The type of the resource to query. id: The id of the resource.

Raises:
RayStateApiException

if the CLI is failed to query the data.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

ray get [OPTIONS] [actors|placement-groups|nodes|workers|tasks|objects] ID

Options

--address <address>

The address of Ray API server. If not provided, it will be configured automatically from querying the GCS server.

--timeout <timeout>

Timeout in seconds for the API requests. Default is 30

Arguments

RESOURCE

Required argument

ID

Required argument

Log CLI

Log CLI allows users to access the log from the cluster. Note that only the logs from alive nodes are available through this API.

ray logs

Print the log file that matches the GLOB_FILTER.

By default, it prints a list of log files that match the filter. If there’s only 1 match, it will print the log file. By default, it prints the head node logs.

Usage:

Print the last 500 lines of raylet.out on a head node.

` ray logs raylet.out -tail 500 `

Print the last 500 lines of raylet.out on a worker node id A.

` ray logs raylet.out -tail 500 —-node-id A `

Follow the log file with an actor id ABC.

` ray logs --actor-id ABC --follow `

Get the actor log from pid 123, ip ABC. Note that this goes well with the driver log of Ray which prints (ip=ABC, pid=123, class_name) logs.

` ray logs —ip=ABC pid=123 `

Download the gcs_server.txt file to the local machine.

` ray logs gcs_server.out -tail -1 > gcs_server.txt `

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

ray logs [OPTIONS] [GLOB_FILTER]

Options

-ip, --node-ip <node_ip>

Filters the logs by this ip address.

-id, --node-id <node_id>

Filters the logs by this NodeID.

-pid, --pid <pid>

Retrieves the logs from the process with this pid.

-a, --actor-id <actor_id>

Retrieves the logs corresponding to this ActorID.

-t, --task-id <task_id>

Retrieves the logs corresponding to this TaskID.

-f, --follow

Streams the log file as it is updated instead of just tailing.

--tail <tail>

Number of lines to tail from log. -1 indicates fetching the whole file.

--timeout <timeout>

Timeout in seconds for the API requests. Default is 30. If –follow is specified, this option will be ignored.

--address <address>

The address of Ray API server. If not provided, it will be configured automatically from querying the GCS server.

Arguments

GLOB_FILTER

Optional argument

State Python SDK

State APIs are also exported as functions.

Summary APIs

ray.experimental.state.api.summarize_actors(address: Optional[str] = None, timeout: int = 30, raise_on_missing_output: bool = True, _explain: bool = False) Dict[source]

Summarize the actors in cluster.

Parameters
  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • timeout – Max timeout for requests made when getting the states.

  • raise_on_missing_output – When True, exceptions will be raised if there is missing data due to truncation/data source unavailable.

  • _explain – Print the API information such as API latency or failed query information.

Returns

Dictionarified ActorSummaries

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

ray.experimental.state.api.summarize_objects(address: Optional[str] = None, timeout: int = 30, raise_on_missing_output: bool = True, _explain: bool = False) Dict[source]

Summarize the objects in cluster.

Parameters
  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • timeout – Max timeout for requests made when getting the states.

  • raise_on_missing_output – When True, exceptions will be raised if there is missing data due to truncation/data source unavailable.

  • _explain – Print the API information such as API latency or failed query information.

Returns

Dictionarified ObjectSummaries

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

ray.experimental.state.api.summarize_tasks(address: Optional[str] = None, timeout: int = 30, raise_on_missing_output: bool = True, _explain: bool = False) Dict[source]

Summarize the tasks in cluster.

Parameters
  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • timeout – Max timeout for requests made when getting the states.

  • raise_on_missing_output – When True, exceptions will be raised if there is missing data due to truncation/data source unavailable.

  • _explain – Print the API information such as API latency or failed query information.

Returns

Dictionarified TaskSummaries

Raises

ExceptionsRayStateApiException if the CLI is failed to query the data.

List APIs

ray.experimental.state.api.list_actors(address: Optional[str] = None, filters: Optional[List[Tuple[str, str, Union[str, bool, int, float]]]] = None, limit: int = 100, timeout: int = 30, detail: bool = False, raise_on_missing_output: bool = True, _explain: bool = False) List[Dict][source]

List actors in the cluster.

Parameters
  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • filters – List of tuples of filter key, predicate (=, or !=), and the filter value. E.g., ("id", "=", "abcd")

  • limit – Max number of entries returned by the state backend.

  • timeout – Max timeout value for the state APIs requests made.

  • detail – When True, more details info (specified in ActorState) will be queried and returned. See ActorState.

  • raise_on_missing_output – When True, exceptions will be raised if there is missing data due to truncation/data source unavailable.

  • _explain – Print the API information such as API latency or failed query information.

Returns

List of dictionarified ActorState.

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

ray.experimental.state.api.list_placement_groups(address: Optional[str] = None, filters: Optional[List[Tuple[str, str, Union[str, bool, int, float]]]] = None, limit: int = 100, timeout: int = 30, detail: bool = False, raise_on_missing_output: bool = True, _explain: bool = False) List[Dict][source]

List placement groups in the cluster.

Parameters
  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • filters – List of tuples of filter key, predicate (=, or !=), and the filter value. E.g., ("state", "=", "abcd")

  • limit – Max number of entries returned by the state backend.

  • timeout – Max timeout value for the state APIs requests made.

  • detail – When True, more details info (specified in PlacementGroupState) will be queried and returned. See PlacementGroupState.

  • raise_on_missing_output – When True, exceptions will be raised if there is missing data due to truncation/data source unavailable.

  • _explain – Print the API information such as API latency or failed query information.

Returns

List of dictionarified PlacementGroupState.

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

ray.experimental.state.api.list_nodes(address: Optional[str] = None, filters: Optional[List[Tuple[str, str, Union[str, bool, int, float]]]] = None, limit: int = 100, timeout: int = 30, detail: bool = False, raise_on_missing_output: bool = True, _explain: bool = False) List[Dict][source]

List nodes in the cluster.

Parameters
  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • filters – List of tuples of filter key, predicate (=, or !=), and the filter value. E.g., ("node_name", "=", "abcd")

  • limit – Max number of entries returned by the state backend.

  • timeout – Max timeout value for the state APIs requests made.

  • detail – When True, more details info (specified in NodeState) will be queried and returned. See NodeState.

  • raise_on_missing_output – When True, exceptions will be raised if there is missing data due to truncation/data source unavailable.

  • _explain – Print the API information such as API latency or failed query information.

Returns

List of dictionarified NodeState.

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

ray.experimental.state.api.list_jobs(address: Optional[str] = None, filters: Optional[List[Tuple[str, str, Union[str, bool, int, float]]]] = None, limit: int = 100, timeout: int = 30, detail: bool = False, raise_on_missing_output: bool = True, _explain: bool = False) List[Dict][source]

List jobs submitted to the cluster by :ref: ray job submission.

Parameters
  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • filters – List of tuples of filter key, predicate (=, or !=), and the filter value. E.g., ("status", "=", "abcd")

  • limit – Max number of entries returned by the state backend.

  • timeout – Max timeout value for the state APIs requests made.

  • detail – When True, more details info (specified in JobState) will be queried and returned. See JobState.

  • raise_on_missing_output – When True, exceptions will be raised if there is missing data due to truncation/data source unavailable.

  • _explain – Print the API information such as API latency or failed query information.

Returns

List of dictionarified JobState.

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

ray.experimental.state.api.list_workers(address: Optional[str] = None, filters: Optional[List[Tuple[str, str, Union[str, bool, int, float]]]] = None, limit: int = 100, timeout: int = 30, detail: bool = False, raise_on_missing_output: bool = True, _explain: bool = False) List[Dict][source]

List workers in the cluster.

Parameters
  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • filters – List of tuples of filter key, predicate (=, or !=), and the filter value. E.g., ("is_alive", "=", "True")

  • limit – Max number of entries returned by the state backend.

  • timeout – Max timeout value for the state APIs requests made.

  • detail – When True, more details info (specified in WorkerState) will be queried and returned. See WorkerState.

  • raise_on_missing_output – When True, exceptions will be raised if there is missing data due to truncation/data source unavailable.

  • _explain – Print the API information such as API latency or failed query information.

Returns

List of dictionarified WorkerState.

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

ray.experimental.state.api.list_tasks(address: Optional[str] = None, filters: Optional[List[Tuple[str, str, Union[str, bool, int, float]]]] = None, limit: int = 100, timeout: int = 30, detail: bool = False, raise_on_missing_output: bool = True, _explain: bool = False) List[Dict][source]

List tasks in the cluster.

Parameters
  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • filters – List of tuples of filter key, predicate (=, or !=), and the filter value. E.g., ("is_alive", "=", "True")

  • limit – Max number of entries returned by the state backend.

  • timeout – Max timeout value for the state APIs requests made.

  • detail – When True, more details info (specified in WorkerState) will be queried and returned. See WorkerState.

  • raise_on_missing_output – When True, exceptions will be raised if there is missing data due to truncation/data source unavailable.

  • _explain – Print the API information such as API latency or failed query information.

Returns

List of dictionarified WorkerState.

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

ray.experimental.state.api.list_objects(address: Optional[str] = None, filters: Optional[List[Tuple[str, str, Union[str, bool, int, float]]]] = None, limit: int = 100, timeout: int = 30, detail: bool = False, raise_on_missing_output: bool = True, _explain: bool = False) List[Dict][source]

List objects in the cluster.

Parameters
  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • filters – List of tuples of filter key, predicate (=, or !=), and the filter value. E.g., ("ip", "=", "0.0.0.0")

  • limit – Max number of entries returned by the state backend.

  • timeout – Max timeout value for the state APIs requests made.

  • detail – When True, more details info (specified in ObjectState) will be queried and returned. See ObjectState.

  • raise_on_missing_output – When True, exceptions will be raised if there is missing data due to truncation/data source unavailable.

  • _explain – Print the API information such as API latency or failed query information.

Returns

List of dictionarified ObjectState.

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

ray.experimental.state.api.list_runtime_envs(address: Optional[str] = None, filters: Optional[List[Tuple[str, str, Union[str, bool, int, float]]]] = None, limit: int = 100, timeout: int = 30, detail: bool = False, raise_on_missing_output: bool = True, _explain: bool = False) List[Dict][source]

List runtime environments in the cluster.

Parameters
  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • filters – List of tuples of filter key, predicate (=, or !=), and the filter value. E.g., ("node_id", "=", "abcdef")

  • limit – Max number of entries returned by the state backend.

  • timeout – Max timeout value for the state APIs requests made.

  • detail – When True, more details info (specified in RuntimeEnvState) will be queried and returned. See RuntimeEnvState.

  • raise_on_missing_output – When True, exceptions will be raised if there is missing data due to truncation/data source unavailable.

  • _explain – Print the API information such as API latency or failed query information.

Returns

List of dictionarified RuntimeEnvState.

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

Get APIs

ray.experimental.state.api.get_actor(id: str, address: Optional[str] = None, timeout: int = 30, _explain: bool = False) Optional[Dict][source]

Get an actor by id.

Parameters
  • id – Id of the actor

  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • timeout – Max timeout value for the state API requests made.

  • _explain – Print the API information such as API latency or failed query information.

Returns

None if actor not found, or dictionarified ActorState.

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

ray.experimental.state.api.get_placement_group(id: str, address: Optional[str] = None, timeout: int = 30, _explain: bool = False) Optional[Dict][source]

Get a placement group by id.

Parameters
  • id – Id of the placement group

  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • timeout – Max timeout value for the state APIs requests made.

  • _explain – Print the API information such as API latency or failed query information.

Returns

None if actor not found, or dictionarified PlacementGroupState.

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

ray.experimental.state.api.get_node(id: str, address: Optional[str] = None, timeout: int = 30, _explain: bool = False) Optional[Dict][source]

Get a node by id.

Parameters
  • id – Id of the node.

  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • timeout – Max timeout value for the state APIs requests made.

  • _explain – Print the API information such as API latency or failed query information.

Returns

None if actor not found, or dictionarified NodeState.

Raises

ExceptionsRayStateApiException if the CLI is failed to query the data.

ray.experimental.state.api.get_worker(id: str, address: Optional[str] = None, timeout: int = 30, _explain: bool = False) Optional[Dict][source]

Get a worker by id.

Parameters
  • id – Id of the worker

  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • timeout – Max timeout value for the state APIs requests made.

  • _explain – Print the API information such as API latency or failed query information.

Returns

None if actor not found, or dictionarified WorkerState.

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

ray.experimental.state.api.get_task(id: str, address: Optional[str] = None, timeout: int = 30, _explain: bool = False) Optional[Dict][source]

Get a task by id.

Parameters
  • id – Id of the task

  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • timeout – Max timeout value for the state APIs requests made.

  • _explain – Print the API information such as API latency or failed query information.

Returns

None if actor not found, or dictionarified TaskState.

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

ray.experimental.state.api.get_objects(id: str, address: Optional[str] = None, timeout: int = 30, _explain: bool = False) List[Dict][source]

Get objects by id.

There could be more than 1 entry returned since an object could be referenced at different places.

Parameters
  • id – Id of the object

  • address – Ray bootstrap address, could be auto, localhost:6379. If None, it will be resolved automatically from an initialized ray.

  • timeout – Max timeout value for the state APIs requests made.

  • _explain – Print the API information such as API latency or failed query information.

Returns

List of dictionarified ObjectState.

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

Log APIs

ray.experimental.state.api.list_logs(address: Optional[str] = None, node_id: Optional[str] = None, node_ip: Optional[str] = None, glob_filter: Optional[str] = None, timeout: int = 30) Dict[str, List[str]][source]

Listing log files available.

Parameters
  • address – Ray bootstrap address, could be auto, localhost:6379. If not specified, it will be retrieved from the initialized ray cluster.

  • node_id – Id of the node containing the logs .

  • node_ip – Ip of the node containing the logs. (At least one of the node_id and node_ip have to be supplied when identifying a node).

  • glob_filter – Name of the file (relative to the ray log directory) to be retrieved. E.g. glob_filter="*worker*" for all worker logs.

  • actor_id – Id of the actor if getting logs from an actor.

  • timeout – Max timeout for requests made when getting the logs.

  • _interval – The interval in secs to print new logs when follow=True.

Returns

A dictionary where the keys are log groups (e.g. gcs, raylet, worker), and values are list of log filenames.

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

ray.experimental.state.api.get_log(address: Optional[str] = None, node_id: Optional[str] = None, node_ip: Optional[str] = None, filename: Optional[str] = None, actor_id: Optional[str] = None, task_id: Optional[str] = None, pid: Optional[int] = None, follow: bool = False, tail: int = 1000, timeout: int = 30, _interval: Optional[float] = None) Generator[str, None, None][source]

Retrieve log file based on file name or some entities ids (pid, actor id, task id).

Examples

>>> import ray
>>> from ray.experimental.state.api import get_log 
# To connect to an existing ray instance if there is
>>> ray.init("auto") 
# Node IP could be retrieved from list_nodes() or ray.nodes()
>>> node_ip = "172.31.47.143"  
>>> filename = "gcs_server.out" 
>>> for l in get_log(filename=filename, node_ip=node_ip): 
>>>    print(l) 
Parameters
  • address – Ray bootstrap address, could be auto, localhost:6379. If not specified, it will be retrieved from the initialized ray cluster.

  • node_id – Id of the node containing the logs .

  • node_ip – Ip of the node containing the logs. (At least one of the node_id and node_ip have to be supplied when identifying a node).

  • filename – Name of the file (relative to the ray log directory) to be retrieved.

  • actor_id – Id of the actor if getting logs from an actor.

  • task_id – Id of the task if getting logs generated by a task.

  • pid – PID of the worker if getting logs generated by a worker.

  • follow – When set to True, logs will be streamed and followed.

  • tail – Number of lines to get from the end of the log file. Set to -1 for getting the entire log.

  • timeout – Max timeout for requests made when getting the logs.

  • _interval – The interval in secs to print new logs when follow=True.

Returns

A Generator of log line, None for SendType and ReturnType.

Raises

ExceptionsRayStateApiException if the CLI failed to query the data.

State APIs Schema

ActorState

class ray.experimental.state.common.ActorState(actor_id: str, class_name: str, state: typing_extensions.Literal[DEPENDENCIES_UNREADY, PENDING_CREATION, ALIVE, RESTARTING, DEAD], name: Optional[str], pid: int, serialized_runtime_env: str, resource_mapping: dict, death_cause: Optional[dict], is_detached: bool)[source]

Actor State

Below columns can be used for the --filter option.

pid

name

class_name

state

actor_id

Below columns are available only when get API is used,

--detail is specified through CLI, or detail=True is given to Python APIs.

resource_mapping

is_detached

serialized_runtime_env

death_cause

actor_id: str

The id of the actor.

class_name: str

The class name of the actor.

state: typing_extensions.Literal[DEPENDENCIES_UNREADY, PENDING_CREATION, ALIVE, RESTARTING, DEAD]

The state of the actor.

  • DEPENDENCIES_UNREADY: Actor is waiting for dependency to be ready. E.g., a new actor is waiting for object ref that’s created from other remote task.

  • PENDING_CREATION: Actor’s dependency is ready, but it is not created yet. It could be because there are not enough resources, too many actor entries in the scheduler queue, or the actor creation is slow (e.g., slow runtime environment creation, slow worker startup, or etc.).

  • ALIVE: The actor is created, and it is alive.

  • RESTARTING: The actor is dead, and it is restarting. It is equivalent to PENDING_CREATION, but means the actor was dead more than once.

  • DEAD: The actor is permanatly dead.

name: Optional[str]

The name of the actor given by the name argument.

pid: int

The pid of the actor. 0 if it is not created yet.

serialized_runtime_env: str

The runtime environment information of the actor.

resource_mapping: dict

The resource requirement of the actor.

death_cause: Optional[dict]

Actor’s death information in detail. None if the actor is not dead yet.

is_detached: bool

True if the actor is detached. False otherwise.

TaskState

class ray.experimental.state.common.TaskState(task_id: str, name: str, scheduling_state: typing_extensions.Literal[NIL, WAITING_FOR_DEPENDENCIES, SCHEDULED, FINISHED, WAITING_FOR_EXECUTION, RUNNING], type: typing_extensions.Literal[NORMAL_TASK, ACTOR_CREATION_TASK, ACTOR_TASK, DRIVER_TASK], func_or_class_name: str, language: str, required_resources: dict, runtime_env_info: str)[source]

Task State

Below columns can be used for the --filter option.

task_id

name

type

func_or_class_name

language

scheduling_state

Below columns are available only when get API is used,

--detail is specified through CLI, or detail=True is given to Python APIs.

runtime_env_info

language

required_resources

task_id: str

The id of the task.

name: str

The name of the task if it is given by the name argument.

scheduling_state: typing_extensions.Literal[NIL, WAITING_FOR_DEPENDENCIES, SCHEDULED, FINISHED, WAITING_FOR_EXECUTION, RUNNING]

The state of the task.

  • NIL: We don’t have a status for this task because we are not the owner or the task metadata has already been deleted.

  • WAITING_FOR_DEPENDENCIES: The task is waiting for its dependencies to be created.

  • SCHEDULED: All dependencies have been created and the task is scheduled to execute. It could be because the task is waiting for resources, runtime environmenet creation, fetching dependencies to the local node, and etc..

  • FINISHED: The task finished successfully.

  • WAITING_FOR_EXECUTION: The task is scheduled properly and waiting for execution. It includes time to deliver the task to the remote worker + queueing time from the execution side.

  • RUNNING: The task that is running.

type: typing_extensions.Literal[NORMAL_TASK, ACTOR_CREATION_TASK, ACTOR_TASK, DRIVER_TASK]

The type of the task.

  • NORMAL_TASK: Tasks created by func.remote()`

  • ACTOR_CREATION_TASK: Actors created by class.remote()

  • ACTOR_TASK: Actor tasks submitted by actor.method.remote()

  • DRIVER_TASK: Driver (A script that calls ray.init).

func_or_class_name: str

The name of the task. If is the name of the function if the type is a task or an actor task. It is the name of the class if it is a actor scheduling task.

language: str

The language of the task. E.g., Python, Java, or Cpp.

required_resources: dict

The required resources to execute the task.

runtime_env_info: str

The runtime environment information for the task.

NodeState

class ray.experimental.state.common.NodeState(node_id: str, node_ip: str, state: typing_extensions.Literal[ALIVE, DEAD], node_name: str, resources_total: dict)[source]

Node State

Below columns can be used for the --filter option.

node_ip

node_name

node_id

state

node_id: str

The id of the node.

node_ip: str

The ip address of the node.

state: typing_extensions.Literal[ALIVE, DEAD]

The state of the node.

ALIVE: The node is alive. DEAD: The node is dead.

node_name: str

The name of the node if it is given by the name argument.

resources_total: dict

The total resources of the node.

PlacementGroupState

class ray.experimental.state.common.PlacementGroupState(placement_group_id: str, name: str, state: typing_extensions.Literal[PENDING, CREATED, REMOVED, RESCHEDULING], bundles: dict, is_detached: bool, stats: dict)[source]

PlacementGroup State

Below columns can be used for the --filter option.

is_detached

name

placement_group_id

state

Below columns are available only when get API is used,

--detail is specified through CLI, or detail=True is given to Python APIs.

bundles

is_detached

stats

placement_group_id: str

The id of the placement group.

name: str

The name of the placement group if it is given by the name argument.

state: typing_extensions.Literal[PENDING, CREATED, REMOVED, RESCHEDULING]

The state of the placement group.

  • PENDING: The placement group creation is pending scheduling. It could be because there’s not enough resources, some of creation stage has failed (e.g., failed to commit placement gropus because the node is dead).

  • CREATED: The placement group is created.

  • REMOVED: The placement group is removed.

  • RESCHEDULING: The placement group is rescheduling because some of bundles are dead because they were on dead nodes.

bundles: dict

The bundle specification of the placement group.

is_detached: bool

True if the placement group is detached. False otherwise.

stats: dict

The scheduling stats of the placement group.

WorkerState

class ray.experimental.state.common.WorkerState(worker_id: str, is_alive: bool, worker_type: typing_extensions.Literal[WORKER, DRIVER, SPILL_WORKER, RESTORE_WORKER], exit_type: Optional[typing_extensions.Literal[SYSTEM_ERROR, INTENDED_SYSTEM_EXIT, USER_ERROR, INTENDED_USER_EXIT]], node_id: str, ip: str, pid: str, exit_detail: Optional[str])[source]

Worker State

Below columns can be used for the --filter option.

exit_type

pid

worker_id

ip

worker_type

node_id

is_alive

Below columns are available only when get API is used,

--detail is specified through CLI, or detail=True is given to Python APIs.

exit_detail

worker_id: str

The id of the worker.

is_alive: bool

Whether or not if the worker is alive.

worker_type: typing_extensions.Literal[WORKER, DRIVER, SPILL_WORKER, RESTORE_WORKER]

The driver (Python script that calls ray.init). - SPILL_WORKER: The worker that spills objects. - RESTORE_WORKER: The worker that restores objects.

Type
  • DRIVER

exit_type: Optional[typing_extensions.Literal[SYSTEM_ERROR, INTENDED_SYSTEM_EXIT, USER_ERROR, INTENDED_USER_EXIT]]

The exit type of the worker if the worker is dead.

  • SYSTEM_ERROR: Worker exit due to system level failures (i.e. worker crash).

  • INTENDED_SYSTEM_EXIT: System-level exit that is intended. E.g., Workers are killed because they are idle for a long time.

  • USER_ERROR: Worker exits because of user error. E.g., execptions from the actor initialization.

  • INTENDED_USER_EXIT: Intended exit from users (e.g., users exit workers with exit code 0 or exit initated by Ray API such as ray.kill).

node_id: str

The node id of the worker.

ip: str

The ip address of the worker.

pid: str

The pid of the worker.

exit_detail: Optional[str]

The exit detail of the worker if the worker is dead.

ObjectState

class ray.experimental.state.common.ObjectState(object_id: str, object_size: int, task_status: typing_extensions.Literal[NIL, WAITING_FOR_DEPENDENCIES, SCHEDULED, FINISHED, WAITING_FOR_EXECUTION, RUNNING], reference_type: typing_extensions.Literal[ACTOR_HANDLE, PINNED_IN_MEMORY, LOCAL_REFERENCE, USED_BY_PENDING_TASK, CAPTURED_IN_OBJECT, UNKNOWN_STATUS], call_site: str, type: typing_extensions.Literal[WORKER, DRIVER, SPILL_WORKER, RESTORE_WORKER], pid: int, ip: str)[source]

Object State

Below columns can be used for the --filter option.

pid

task_status

type

object_size

reference_type

ip

object_id

call_site

object_id: str

The id of the object.

object_size: int

The size of the object in mb.

task_status: typing_extensions.Literal[NIL, WAITING_FOR_DEPENDENCIES, SCHEDULED, FINISHED, WAITING_FOR_EXECUTION, RUNNING]

The status of the task that creates the object.

  • NIL: We don’t have a status for this task because we are not the owner or the task metadata has already been deleted.

  • WAITING_FOR_DEPENDENCIES: The task is waiting for its dependencies to be created.

  • SCHEDULED: All dependencies have been created and the task is scheduled to execute. It could be because the task is waiting for resources, runtime environmenet creation, fetching dependencies to the local node, and etc..

  • FINISHED: The task finished successfully.

  • WAITING_FOR_EXECUTION: The task is scheduled properly and waiting for execution. It includes time to deliver the task to the remote worker + queueing time from the execution side.

  • RUNNING: The task that is running.

reference_type: typing_extensions.Literal[ACTOR_HANDLE, PINNED_IN_MEMORY, LOCAL_REFERENCE, USED_BY_PENDING_TASK, CAPTURED_IN_OBJECT, UNKNOWN_STATUS]

The reference type of the object. See Debugging with Ray Memory for more details.

  • ACTOR_HANDLE: The reference is an actor handle.

  • PINNED_IN_MEMORY: The object is pinned in memory, meaning there’s in-flight ray.get on this reference.

  • LOCAL_REFERENCE: There’s a local reference (e.g., Python reference) to this object reference. The object won’t be GC’ed until all of them is gone.

  • USED_BY_PENDING_TASK: The object reference is passed to other tasks. E.g., a = ray.put() -> task.remote(a). In this case, a is used by a pending task task.

  • CAPTURED_IN_OBJECT: The object is serialized by other objects. E.g., a = ray.put(1) -> b = ray.put([a]). a is serialized within a list.

  • UNKNOWN_STATUS: The object ref status is unkonwn.

call_site: str

The callsite of the object.

type: typing_extensions.Literal[WORKER, DRIVER, SPILL_WORKER, RESTORE_WORKER]

The worker type that creates the object.

  • WORKER: The regular Ray worker process that executes tasks or instantiates an actor.

  • DRIVER: The driver (Python script that calls ray.init).

  • SPILL_WORKER: The worker that spills objects.

  • RESTORE_WORKER: The worker that restores objects.

pid: int

The pid of the owner.

ip: str

The ip address of the owner.

RuntimeEnvState

class ray.experimental.state.common.RuntimeEnvState(runtime_env: str, success: bool, creation_time_ms: Optional[float], node_id: str, ref_cnt: int, error: Optional[str])[source]

Runtime Environment State

Below columns can be used for the --filter option.

error

node_id

runtime_env

success

Below columns are available only when get API is used,

--detail is specified through CLI, or detail=True is given to Python APIs.

ref_cnt

error

runtime_env: str

The runtime environment spec.

success: bool

Whether or not the runtime env creation has succeeded.

creation_time_ms: Optional[float]

The latency of creating the runtime environment. Available if the runtime env is successfully created.

node_id: str

The node id of this runtime environment.

ref_cnt: int

The number of actors and tasks that use this runtime environment.

error: Optional[str]

The error message if the runtime environment creation has failed. Available if the runtime env is failed to be created.

JobState

class ray.experimental.state.common.JobState(status: ray.dashboard.modules.job.common.JobStatus, entrypoint: str, message: Optional[str] = None, error_type: Optional[str] = None, start_time: Optional[int] = None, end_time: Optional[int] = None, metadata: Optional[Dict[str, str]] = None, runtime_env: Optional[Dict[str, Any]] = None)[source]

The state of the job that’s submitted by Ray’s Job APIs

Below columns can be used for the --filter option.

status

entrypoint

error_type

classmethod list_columns() List[str][source]

Return a list of columns.

classmethod filterable_columns() Set[str][source]

Return a list of filterable columns

StateSummary

class ray.experimental.state.common.StateSummary(node_id_to_summary: Dict[str, Union[ray.experimental.state.common.TaskSummaries, ray.experimental.state.common.ActorSummaries, ray.experimental.state.common.ObjectSummaries]])[source]
node_id_to_summary: Dict[str, Union[ray.experimental.state.common.TaskSummaries, ray.experimental.state.common.ActorSummaries, ray.experimental.state.common.ObjectSummaries]]

Node ID -> summary per node If the data is not required to be orgnized per node, it will contain a single key, “cluster”.

TaskSummary

class ray.experimental.state.common.TaskSummaries(summary: Dict[str, ray.experimental.state.common.TaskSummaryPerFuncOrClassName], total_tasks: int, total_actor_tasks: int, total_actor_scheduled: int, summary_by: str = 'func_name')[source]
total_tasks: int

Total Ray tasks.

total_actor_tasks: int

Total actor tasks.

total_actor_scheduled: int

Total scheduled actors.

class ray.experimental.state.common.TaskSummaryPerFuncOrClassName(func_or_class_name: str, type: str, state_counts: Dict[typing_extensions.Literal['NIL', 'WAITING_FOR_DEPENDENCIES', 'SCHEDULED', 'FINISHED', 'WAITING_FOR_EXECUTION', 'RUNNING'], int] = <factory>)[source]
func_or_class_name: str

The function or class name of this task.

type: str

The type of the class. Equivalent to protobuf TaskType.

state_counts: Dict[typing_extensions.Literal[NIL, WAITING_FOR_DEPENDENCIES, SCHEDULED, FINISHED, WAITING_FOR_EXECUTION, RUNNING], int]

State name to the count dict. State name is equivalent to the protobuf TaskStatus.

ActorSummary

class ray.experimental.state.common.ActorSummaries(summary: Dict[str, ray.experimental.state.common.ActorSummaryPerClass], total_actors: int, summary_by: str = 'class')[source]
summary: Dict[str, ray.experimental.state.common.ActorSummaryPerClass]

Group key (actor class name) -> summary

total_actors: int

Total number of actors

class ray.experimental.state.common.ActorSummaryPerClass(class_name: str, state_counts: Dict[typing_extensions.Literal['DEPENDENCIES_UNREADY', 'PENDING_CREATION', 'ALIVE', 'RESTARTING', 'DEAD'], int] = <factory>)[source]
class_name: str

The class name of the actor.

state_counts: Dict[typing_extensions.Literal[DEPENDENCIES_UNREADY, PENDING_CREATION, ALIVE, RESTARTING, DEAD], int]

State name to the count dict. State name is equivalent to the protobuf ActorState.

ObjectSummary

class ray.experimental.state.common.ObjectSummaries(summary: Dict[str, ray.experimental.state.common.ObjectSummaryPerKey], total_objects: int, total_size_mb: float, callsite_enabled: bool, summary_by: str = 'callsite')[source]
summary: Dict[str, ray.experimental.state.common.ObjectSummaryPerKey]

Group key (actor class name) -> summary

total_objects: int

Total number of referenced objects in the cluster.

total_size_mb: float

Total size of referenced objects in the cluster in MB.

callsite_enabled: bool

Whether or not the callsite collection is enabled.

class ray.experimental.state.common.ObjectSummaryPerKey(total_objects: int, total_size_mb: float, total_num_workers: int, total_num_nodes: int, task_state_counts: Dict[typing_extensions.Literal['NIL', 'WAITING_FOR_DEPENDENCIES', 'SCHEDULED', 'FINISHED', 'WAITING_FOR_EXECUTION', 'RUNNING'], int] = <factory>, ref_type_counts: Dict[typing_extensions.Literal['ACTOR_HANDLE', 'PINNED_IN_MEMORY', 'LOCAL_REFERENCE', 'USED_BY_PENDING_TASK', 'CAPTURED_IN_OBJECT', 'UNKNOWN_STATUS'], int] = <factory>)[source]
total_objects: int

Total number of objects of the type.

total_size_mb: float

Total size in mb.

total_num_workers: int

Total number of workers that reference the type of objects.

total_num_nodes: int

Total number of nodes that reference the type of objects.

task_state_counts: Dict[typing_extensions.Literal[NIL, WAITING_FOR_DEPENDENCIES, SCHEDULED, FINISHED, WAITING_FOR_EXECUTION, RUNNING], int]

State name to the count dict. State name is equivalent to ObjectState.

ref_type_counts: Dict[typing_extensions.Literal[ACTOR_HANDLE, PINNED_IN_MEMORY, LOCAL_REFERENCE, USED_BY_PENDING_TASK, CAPTURED_IN_OBJECT, UNKNOWN_STATUS], int]

Ref count type to the count dict. State name is equivalent to ObjectState.

State APIs Exceptions

class ray.experimental.state.exception.RayStateApiException(err_msg, *args)[source]