ray.init#

ray.init(address: str | None = None, *, num_cpus: int | None = None, num_gpus: int | None = None, resources: Dict[str, float] | None = None, labels: Dict[str, str] | None = None, object_store_memory: int | None = None, local_mode: bool = False, ignore_reinit_error: bool = False, include_dashboard: bool | None = None, dashboard_host: str = '127.0.0.1', dashboard_port: int | None = None, job_config: ray.job_config.JobConfig = None, configure_logging: bool = True, logging_level: int = 'info', logging_format: str | None = None, logging_config: LoggingConfig | None = None, log_to_driver: bool | None = None, namespace: str | None = None, runtime_env: Dict[str, Any] | RuntimeEnv | None = None, enable_resource_isolation: bool = False, system_reserved_cpu: float | None = None, system_reserved_memory: int | None = None, **kwargs) → BaseContext[source]#

Connect to an existing Ray cluster or start one and connect to it.

This method handles two cases; either a Ray cluster already exists and we just attach this driver to it or we start all of the processes associated with a Ray cluster and attach to the newly started cluster. Note: This method overwrite sigterm handler of the driver process.

In most cases, it is enough to just call this method with no arguments. This will autodetect an existing Ray cluster or start a new Ray instance if no existing cluster is found:

ray.init()

To explicitly connect to an existing local cluster, use this as follows. A ConnectionError will be thrown if no existing local cluster is found.

ray.init(address="auto")

To connect to an existing remote cluster, use this as follows (substituting in the appropriate address). Note the addition of “ray://” at the beginning of the address. This requires ray[client].

ray.init(address="ray://123.45.67.89:10001")

More details for starting and connecting to a remote cluster can be found here: https://docs.ray.io/en/master/cluster/getting-started.html

You can also define an environment variable called RAY_ADDRESS in the same format as the address parameter to connect to an existing cluster with ray.init() or ray.init(address=”auto”).

Parameters:

address – The address of the Ray cluster to connect to. The provided address is resolved as follows: 1. If a concrete address (e.g., localhost:<port>) is provided, try to connect to it. Concrete addresses can be prefixed with “ray://” to connect to a remote cluster. For example, passing in the address “ray://123.45.67.89:50005” will connect to the cluster at the given address. 2. If no address is provided, try to find an existing Ray instance to connect to. This is done by first checking the environment variable RAY_ADDRESS. If this is not defined, check the address of the latest cluster started (found in /tmp/ray/ray_current_cluster) if available. If this is also empty, then start a new local Ray instance. 3. If the provided address is “auto”, then follow the same process as above. However, if there is no existing cluster found, this will throw a ConnectionError instead of starting a new local Ray instance. 4. If the provided address is “local”, start a new local Ray instance, even if there is already an existing local Ray instance.
num_cpus – Number of CPUs the user wishes to assign to each raylet. By default, this is set based on virtual cores.
num_gpus – Number of GPUs the user wishes to assign to each raylet. By default, this is set based on detected GPUs.
resources – A dictionary mapping the names of custom resources to the quantities for them available.
labels – [Experimental] The key-value labels of the node.
object_store_memory – The amount of memory (in bytes) to start the object store with. By default, this is 30% of available system memory capped by the shm size and 200G but can be set higher.
local_mode – Deprecated: consider using the Ray Distributed Debugger instead.
ignore_reinit_error – If true, Ray suppresses errors from calling ray.init() a second time. Ray won’t be restarted.
include_dashboard – Boolean flag indicating whether or not to start the Ray dashboard, which displays the status of the Ray cluster. If this argument is None, then the UI will be started if the relevant dependencies are present.
dashboard_host – The host to bind the dashboard server to. Can either be localhost (127.0.0.1) or 0.0.0.0 (available from all interfaces). By default, this is set to localhost to prevent access from external machines.
dashboard_port (int, None) – The port to bind the dashboard server to. Defaults to 8265 and Ray will automatically find a free port if 8265 is not available.
job_config (ray.job_config.JobConfig) – The job configuration.
configure_logging – True (default) if configuration of logging is allowed here. Otherwise, the user may want to configure it separately.
logging_level – Logging level for the “ray” logger of the driver process, defaults to logging.INFO. Ignored unless “configure_logging” is true.
logging_format – Logging format for the “ray” logger of the driver process, defaults to a string containing a timestamp, filename, line number, and message. See the source file ray_constants.py for details. Ignored unless “configure_logging” is true.
logging_config – [Experimental] Logging configuration will be applied to the root loggers for both the driver process and all worker processes belonging to the current job. See LoggingConfig for details.
log_to_driver – If true, the output from all of the worker processes on all nodes will be directed to the driver.
namespace – A namespace is a logical grouping of jobs and named actors.
runtime_env – The runtime environment to use for this job (see Runtime environments for details).
object_spilling_directory – The path to spill objects to. The same path will be used as the object store fallback directory as well.
enable_resource_isolation – Enable resource isolation through cgroupv2 by reserving memory and cpu resources for ray system processes. To use, only cgroupv2 (not cgroupv1) must be enabled with read and write permissions for the raylet. Cgroup memory and cpu controllers must also be enabled.
system_reserved_cpu – The amount of cpu cores to reserve for ray system processes. Cores can be fractional i.e. 0.5 means half a cpu core. By default, the min of 20% and 1 core will be reserved. Must be >= 0.5 cores and < total number of available cores. Cannot be less than 0.5 cores. This option only works if enable_resource_isolation is True.
system_reserved_memory – The amount of memory (in bytes) to reserve for ray system processes. By default, the min of 10% and 25GB plus object_store_memory will be reserved. Must be >= 100MB and system_reserved_memory + object_store_bytes < total available memory. This option only works if enable_resource_isolation is True.
_cgroup_path – The path for the cgroup the raylet should use to enforce resource isolation. By default, the cgroup used for resource isolation will be /sys/fs/cgroup. The raylet must have read/write permissions to this path. Cgroup memory and cpu controllers be enabled for this cgroup. This option only works if enable_resource_isolation is True.
_enable_object_reconstruction – If True, when an object stored in the distributed plasma store is lost due to node failure, Ray will attempt to reconstruct the object by re-executing the task that created the object. Arguments to the task will be recursively reconstructed. If False, then ray.ObjectLostError will be thrown.
_plasma_directory – Override the plasma mmap file directory.
_node_ip_address – The IP address of the node that we are on.
_driver_object_store_memory – Deprecated.
_memory – Amount of reservable memory resource in bytes rounded down to the nearest integer.
_redis_username – Prevents external clients without the username from connecting to Redis if provided.
_redis_password – Prevents external clients without the password from connecting to Redis if provided.
_temp_dir – If provided, specifies the root temporary directory for the Ray process. Must be an absolute path. Defaults to an OS-specific conventional location, e.g., “/tmp/ray”.
_metrics_export_port – Port number Ray exposes system metrics through a Prometheus endpoint. It is currently under active development, and the API is subject to change.
_system_config – Configuration for overriding RayConfig defaults. For testing purposes ONLY.
_tracing_startup_hook – If provided, turns on and sets up tracing for Ray. Must be the name of a function that takes no arguments and sets up a Tracer Provider, Remote Span Processors, and (optional) additional instruments. See more at docs.ray.io/tracing.html. It is currently under active development, and the API is subject to change.
_node_name – User-provided node name or identifier. Defaults to the node IP address.

Returns:

If the provided address includes a protocol, for example by prepending “ray://” to the address to get “ray://1.2.3.4:10001”, then a ClientContext is returned with information such as settings, server versions for ray and python, and the dashboard_url. Otherwise, a RayContext is returned with ray and python versions, and address information about the started processes.

Raises:

Exception – An exception is raised if an inappropriate combination of arguments is passed in.