.. _ray_glossary:

Ray Glossary
============

On this page you find a list of important terminology used throughout the Ray
documentation, sorted alphabetically.

.. glossary::

    Action space
        Property of an RL environment. The shape(s) and datatype(s) that actions within
        an RL environment are allowed to have.
        Examples: An RL environment, in which an agent can move up, down, left,
        or right might have an action space of ``Discrete(4)`` (integer values
        of 0, 1, 2, or 3).
        An RL environment, in which an agent can apply a torque between
        -1.0 and 1.0 to a joint, the action space might be
        ``Box(-1.0, 1.0, (1,), float32)`` (single float values between -1.0 and 1.0).

    Actor
        A Ray actor is a remote instance of a class, which is
        essentially a stateful service. :ref:`Learn more about Ray actors<actor-guide>`.

    Actor task
        An invocation of a Ray actor method. Sometimes we just call it a task.

    Ray Agent
        Daemon process running on each Ray node. It has several functionalities like
        collecting metrics on the local node and installing runtime environments.

    Agent
        An acting entity inside an RL environment. One RL environment might contain
        one (single-agent RL) or more (multi-agent RL) acting agents. Different agents
        within the same environment might have different observation- and action-spaces,
        different reward functions, and act at different time-steps.

    Algorithm
        A class that holds the who/when/where/how for training one or more RL agent(s).
        The user interacts with an Algorithm instance directly to train their agents
        (it is the top-most user facing API of RLlib).

    Asynchronous execution
        An execution model where a later task can begin executing in parallel,
        without waiting for an earlier task to finish.
        Ray tasks and actor tasks are all executed asynchronously.

    Asynchronous sampling
        Sampling is the process of rolling out (playing) episodes within an RL
        environment and thereby collecting the training data (observations, actions
        and rewards). In an asynchronous sampling setup, Ray actors run sampling in the
        background and send collected samples back to a main driver script. The driver
        checks for such “ready” data frequently and then triggers central model
        learning updates. Hence, sampling and learning happen at the same time.
        Note that because of this, the policy/ies used for creating the samples
        (action computations) might be slightly behind the centrally learned policy
        model(s), even in an on-policy Algorithm.

    Autoscaler
        A Ray component that scales up and down the Ray cluster by adding and removing
        Ray nodes according to the resources requested by applications running on
        the cluster.

    Autoscaling
        The process of scaling up and down the Ray cluster automatically.

    Backend
        A class containing the initialization and teardown logic for a specific deep
        learning framework (e.g., Torch, TensorFlow), used to set up distributed
        data-parallel training for :ref:`Ray Train’s built-in trainers<train-api>`.

    Batch format
        The way Ray Data represents batches of data. The batch format is independent
        from how Ray Data stores the underlying blocks, so you can use any batch format
        regardless of the internal block representation.

        Set ``batch_format`` in methods like
        :meth:`Dataset.iter_batches() <ray.data.Dataset.iter_batches>` and
        :meth:`Dataset.map_batches() <ray.data.Dataset.map_batches>` to specify the
        batch type.

        .. doctest::

            >>> import ray
            >>> dataset = ray.data.range(15)
            >>> next(iter(dataset.iter_batches(batch_format="numpy", batch_size=5)))
            {'id': array([0, 1, 2, 3, 4])}
            >>> next(iter(dataset.iter_batches(batch_format="pandas", batch_size=5)))
               id
            0   0
            1   1
            2   2
            3   3
            4   4
            >>> next(iter(dataset.iter_batches(batch_format="pyarrow", batch_size=5)))
            pyarrow.Table
            id: int64
            ----
            id: [[0],[1],...,[3],[4]]

        To learn more about batch formats, read
        :ref:`Configuring batch formats <configure_batch_format>`.

    Batch size
        A batch size in the context of model training is the number of data points used
        to compute and apply one gradient update to the model weights.

    Block
        A processing unit of data. A :class:`~ray.data.Dataset` consists of a
        collection of blocks.

        Under the hood, Ray Data partitions rows into a set of distributed data blocks.
        This allows it to perform operations in parallel.

        Unlike a batch, which is a user-facing object, a block is an internal abstraction.

    Placement Group Bundle
        A collection of resources that must be reserved on a single Ray node.
        :ref:`Learn more<ray-placement-group-doc-ref>`.

    Checkpoint
        A Ray Train Checkpoint is a common interface for accessing data and models across
        different Ray components and libraries. A Checkpoint can have its data
        represented as a directory on local (on-disk) storage, as a directory on an
        external storage (e.g., cloud storage), and as an in-memory dictionary.
        :class:`Learn more <ray.train.Checkpoint>`.

        .. TODO: How does this relate to RLlib checkpoints etc.? Be clear here

    Ray Client
        The Ray Client is an API that connects a Python script to a remote Ray cluster.
        Effectively, it allows you to leverage a remote Ray cluster just like you would
        with Ray running on your local machine.
        :ref:`Learn more<ray-client-ref>`.

    Ray Cluster
        A Ray cluster is a set of worker nodes connected to a common Ray head node.
        Ray clusters can be fixed-size, or they can autoscale up and down according to
        the resources requested by applications running on the cluster.

    .. TODO: Add "Concurrency" here, or try to avoid this in docs.

    Connector
        A connector performs transformations on data that comes out of a dataset or an
        RL environment and is about to be passed to a model. Connectors are flexible
        components and can be swapped out such that models are easily reusable and do
        not have to be retrained for different data transformations.

    Tune Config
        This is the set of hyperparameters corresponding to a Tune trial.
        Sampling from a hyperparameter search space will produce a config.

    .. TODO: DAG

    Ray Dashboard
        Ray’s built-in dashboard is a web interface that provides metrics, charts,
        and other features that help Ray users to understand and debug Ray applications.

    .. TODO: Data Shuffling

    Dataset (object)
        A class that produces a sequence of distributed data blocks.

        :class:`~ray.data.Dataset` exposes methods to read, transform, and consume data at scale.

        To learn more about Datasets and the operations they support, read the :ref:`Datasets API Reference <data-api>`.

    Deployment
        A deployment is a group of actors that can handle traffic in Ray Serve.
        Deployments are defined as a single class with a number of options, including
        the number of “replicas” of the deployment, each of which will map to a Ray
        actor at runtime. Requests to a deployment are load balanced across its replicas.

    Ingress Deployment
        In Ray Serve, the “ingress” deployment is the one that receives and responds to
        inbound user traffic. It handles HTTP parsing and response formatting. In the case
        of model composition, it would also fan out requests to other deployments to do
        things like preprocessing and a forward pass of an ML model.

    Driver
        "Driver" is the name of the process running the main script that starts all
        other processes. For Python, this is usually the script you start with
        ``python ...``.

    Tune Driver
        The Tune driver is the main event loop that’s happening on the node that
        launched the Tune experiment. This event loop schedules trials given the
        cluster resources, executes training on remote Trainable actors, and processes
        results and checkpoints from those actors.

    Distributed Data-Parallel
        A distributed data-parallel (DDP) training job scales machine learning training
        to happen on multiple nodes, where each node processes one shard of the full
        dataset. Every worker holds a copy of the model weights, and a common strategy
        for updating weights is a “mirrored strategy”, where each worker will hold the
        exact same weights at all times, and computed gradients are averaged then
        applied across all workers.

        With N worker nodes and a dataset of size D, each worker is responsible for
        only ``D / N`` datapoints. If each worker node computes the gradient on a batch
        of size ``B``, then the effective batch size of the DDP training is ``N * B``.

    .. TODO: Entrypoint

    Environment
        The world or simulation, in which one or more reinforcement learning agents
        have to learn to behave optimally with respect to a given reward function. An
        environment consists of an observation space, a reward function, an action
        space, a state transition function, and a distribution over initial states
        (after a reset).

        Episodes consisting of one or more time-steps are played through an
        environment in order to generate and collect samples for learning.
        These samples contain one 4-tuple of
        ``[observation, action, reward, next observation]`` per timestep.

    Episode
        A series of subsequent RL environment timesteps, each of which is a
        4-tuple: ``[observation, action, reward, next observation]``.
        Episodes can end with the terminated- or truncated-flags being True.
        An episode generally spans multiple time-steps for one or more agents.
        The Episode is an important concept in RL as "optimal agent behavior" is
        defined as choosing actions that maximize the sum of individual rewards
        over the course of an episode.

    Trial Executor
        An internal :ref:`Ray Tune component<raytrialexecutor-docstring>` that manages
        the resource management and execution of each trial’s corresponding remote
        Trainable actor. The trial executor’s responsibilities include launching
        training, checkpointing, and restoring remote tasks.

    Experiment
        A Ray Tune or Ray Train experiment is a collection of one or more training jobs
        that may correspond to different hyperparameter configurations. These
        experiments are launched via the
        :ref:`Tuner API<tune-run-ref>` and the :ref:`Trainer API<train-api>`.

    .. TODO: Event

    Fault tolerance
        Fault tolerance in Ray Train and Tune consists of experiment-level and trial-level
        restoration. Experiment-level restoration refers to resuming all trials,
        in the event that an experiment is interrupted in the middle of training due
        to a cluster-level failure. Trial-level restoration refers to resuming
        individual trials, in the event that a trial encounters a runtime
        error such as OOM.

        .. TODO: more on fault tolerance in Core

    Framework
        The deep-learning framework used for the model(s), loss(es), and optimizer(s)
        inside an RLlib Algorithm. RLlib currently supports PyTorch and TensorFlow.

    GCS / Global Control Service
        Centralized metadata server for a Ray cluster. It runs on the Ray head node
        and has functions like managing node membership and actor directory.
        It’s also known as the Global Control Store.

    Head node
        A node that runs extra cluster-level processes like GCS and API server in
        addition to those processes running on a worker node. A Ray cluster only has
        one head node.

    HPO
        Hyperparameter optimization (HPO) is the process of choosing a set of optimal
        hyperparameters for a learning algorithm. A hyperparameter can be a parameter
        whose value is used to control the learning process (e.g., learning rate),
        define the model architecture (e.g, number of hidden layers), or influence data
        pre-processing. In the case of Ray Train, hyperparameters can also include
        compute processing scale-out parameters such as the number of distributed
        training workers.

    .. TODO: Inference

    Job
        A Ray job is a packaged Ray application that can be executed on a
        (remote) Ray cluster. :ref:`Learn more<jobs-overview>`.

    Lineage
        For Ray objects, this is the set of tasks that was originally executed to
        produce the object. If an object’s value is lost due to node failure,
        Ray may attempt to recover the value by re-executing the object’s lineage.

    .. TODO: Logs

    .. TODO: Metrics

    Model
        A function approximator with trainable parameters (e.g. a neural network) that
        can be trained by an algorithm on available data or collected data from an RL
        environment. The parameters are usually initialized at random (unlearned state).
        During the training process, checkpoints of the model can be created such that -
        after the learning process is shut down or crashes - training can resume from
        the latest weights rather than having to re-learn from scratch.
        After the training process is completed, models can be deployed into production
        for inference using Ray Serve.

    Multi-agent
        Denotes an RL environment setup, in which several (more than one) agents act
        in the same environment and learn either the same or different optimal
        behaviors. The relationship between the different agents in a multi-agent setup
        might be adversarial (playing against each other), cooperative (trying to reach
        a common goal) or neutral (the agents don’t really care about other agents’
        actions). The NN model architectures that can be used for multi-agent training
        range from "independent" (each agent trains its own separate model), over
        "partially shared" (i.e. some agents might share their value function, because
        they have a common goal), to "identical" (all agents train on the same model).

    Namespace
        A namespace is a logical grouping of jobs and named actors. When an actor is
        named, its name must be unique within the namespace.
        When a namespace is not specified, Ray will place your job in an anonymous
        namespace.

    Node
        A Ray node is a physical or virtual machine that is part of a Ray cluster.
        See also :term:`Head node`.

    Object
        An application value. These are values that are returned by a task or
        created through ``ray.put``.

    Object ownership
        Ownership is the concept used to decide where metadata for a certain
        ``ObjectRef`` (and the task that creates the value) should be stored.
        If a worker calls ``foo.remote()`` or ``ray.put()``, it owns the metadata for
        the returned ``ObjectRef``, e.g., ref count and location information. If an
        object’s owner dies and another worker tries to get the value,
        it will receive an ``OwnerDiedError`` exception.

    Object reference
        A pointer to an application value, which can be stored anywhere in the cluster.
        Can be created by calling ``foo.remote()`` or ``ray.put()``.
        If using ``foo.remote()``, then the returned ``ObjectRef`` is also a future.

    Object store
        A distributed in-memory data store for storing Ray objects.

    Object spilling
        Objects in the object store are spilled to external storage once the capacity
        of the object store is used up. This enables out-of-core data processing for
        memory-intensive distributed applications. It comes with a performance penalty
        since data needs to be written to disk.

    .. TODO: Observability

    Observation
        The full or partial state of an RL environment, which an agent sees
        (has access to) at each timestep. A fully observable environment produces
        observations that contain all the information to sufficiently infer the current
        underlying state of the environment. Such states are also called “Markovian”.
        Examples for environments with Markovian observations are chess or 2D games,
        in which the player can see with each frame the entirety of the game’s state.
        A partially observable (or non-Markovian) environment produces observations
        that do not contain sufficient information to infer the exact underlying state.
        An example here would be a robot with a camera on its head facing forward.
        The robot walks around in a maze, but from a single camera frame might not know
        what’s currently behind it.

    Offline data
        Data collected in an RL environment up-front and stored in some data format
        (e.g. JSON). Offline data can be used to train an RL agent. The data might have
        been generated by a non-RL/ML system, such as a simple decision making script.
        Also, when training from offline data, the RL algorithm will not be able to
        explore new actions in new situations as all interactions with the environment
        already happened in the past (were recorded prior to training).

    Offline RL
        A sub-field of reinforcement learning (RL), in which specialized offline
        RL Algorithms learn how to compute optimal actions for an agent inside an
        environment without the ability to interact live with that environment.
        Instead, the data used for training has already been collected up-front
        (maybe even by a non-RL/ML system). This is very similar to a supervised
        learning setup. Examples for offline RL algorithms are MARWIL, CQL, and CRR.

    Off-Policy
        A type of RL Algorithm. In an off-policy algorithm, the policy used to compute
        the actions inside an RL environment (to generate the training data) might be
        different from the one that is being optimized. Examples for off-policy
        Algorithms are DQN, SAC, and DDPG.

    On-Policy
        A type of RL Algorithm. In an on-policy algorithm, the policy used to compute
        the actions inside an RL environment (to generate the training data) must be the
        exact same (matching NN weights at all times) as the one that's being
        optimized. Examples for on-policy Algorithms are PPO, APPO, and IMPALA.

    OOM (Out of Memory)
        Ray may run out of memory if the application is using too much memory on a
        single node. In this case the :ref:`Ray OOM killer<oom-questions>` will kick
        in and kill worker processes to free up memory.

    Placement group
        Placement groups allow users to atomically reserve groups of resources across
        multiple nodes (i.e., gang scheduling). They can be then used to schedule Ray
        tasks and actors packed as close as possible for locality (PACK), or spread
        apart (SPREAD). Placement groups are generally used for gang-scheduling actors,
        but also support tasks.
        :ref:`Learn more<ray-placement-group-doc-ref>`.

    Policy
        A (neural network) model that maps an RL environment observation of some agent
        to its next action inside an RL environment.

    .. TODO: Policy evaluation

    Predictor
        :class:`An interface for performing inference<ray.train.predictor.Predictor>` (prediction)
        on input data with a trained model.

    Preprocessor
        :ref:`An interface used to preprocess a Dataset<preprocessor-ref>` for
        training and inference (prediction). Preprocessors
        can be stateful, as they can be fitted on the training dataset before being
        used to transform the training and evaluation datasets.

    .. TODO: Process

    Ray application
        A collection of Ray tasks, actors, and objects that originate from the
        same script.

    .. TODO: Ray Timeline

    Raylet
        A system process that runs on each Ray node. It’s responsible for scheduling
        and object management.

    Replica
        A replica is a single actor that handles requests to a given Serve deployment.
        A deployment may consist of many replicas, either statically-configured via
        ``num_replicas`` or dynamically configured using auto-scaling.

    Resource (logical and physical)
        Ray resources are logical resources (e.g. CPU, GPU) used by tasks and actors.
        It doesn't necessarily map 1-to-1 to physical resources of machines on which
        Ray cluster runs. :ref:`Learn more<core-resources>`.

    Reward
        A single floating point value that each agent within an RL environment receives
        after each action taken. An agent is defined to be acting optimally inside the
        RL environment when the sum over all received rewards within an episode is
        maximized.

        Note that rewards might be delayed (not immediately telling the agent, whether
        an action was good or bad) or sparse (often have a value of zero) making it
        harder for the agent to learn.

    Rollout
        The process of advancing through an episode in an RL environment (with one or
        more RL agents) by taking sequential actions. During rollouts, the algorithm
        should collect the environment produced 4-tuples [observations, actions,
        rewards, next observations] in order to (later or simultaneously) learn how to
        behave more optimally from this data.

    Rollout Worker
        Component within a RLlib Algorithm responsible for advancing and collecting
        observations and rewards in an RL environment. Actions for the different
        agent(s) within the environment are computed by the Algorithms’ policy models.
        A distributed algorithm might have several replicas of Rollout Workers running
        as Ray actors in order to scale the data collection process for faster RL
        training.

        .. START ROLLOUT WORKER

        RolloutWorkers are used as ``@ray.remote`` actors to collect and return samples
        from environments or offline files in parallel. An RLlib
        :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` usually has
        ``num_workers`` :py:class:`~ray.rllib.env.env_runner.EnvRunner` instances plus a
        single "local" :py:class:`~ray.rllib.env.env_runner.EnvRunner` (not ``@ray.remote``) in
        its :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup` under ``self.workers``.

        Depending on its evaluation config settings, an additional
        :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup` with
        :py:class:`~ray.rllib.env.env_runner.EnvRunner` instances for evaluation may be present in the
        :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
        under ``self.evaluation_workers``.

        .. END ROLLOUT WORKER

    .. TODO: Runtime

    Runtime environment
        A runtime environment defines dependencies such as files, packages, environment
        variables needed for a Python script to run. It is installed dynamically on the
        cluster at runtime, and can be specified for a Ray job, or for specific actors
        and tasks. :ref:`Learn more<handling_dependencies>`.

    Remote Function
        See :term:`Task`.

    Remote Class
        See :term:`Actor`.

    (Ray) Scheduler
        A Ray component that assigns execution units (Task/Actor) to Ray nodes.

    Search Space
        The definition of the possible values for hyperparameters. Can be composed out
        of constants, discrete values, distributions of functions. This is also
        referred to as the “parameter space” (``param_space`` in the ``Tuner``).

    Search algorithm
        Search algorithms suggest new hyperparameter configurations to be evaluated
        by Tune. The default search algorithm is random search, where each new
        configuration is independent from the previous one. More sophisticated search
        algorithms such as ones using Bayesian optimization will fit a model to predict
        the hyperparameter configuration that will produce the best model, while also
        exploring the space of possible hyperparameters. Many popular search algorithms
        are built into Tune, most of which are integrations with other libraries.

    Serve application
        An application is the unit of upgrade in a Serve cluster.

        An application consists of one or more deployments. One of these deployments
        is considered the “ingress” deployment, which is where all inbound
        traffic is handled.

        Applications can be called via HTTP at their configured ``route_prefix``.

    DeploymentHandle
        DeploymentHandle is the Python API for making requests to Serve deployments. A
        handle is defined by passing one bound Serve deployment to the constructor of
        another. Then at runtime that reference can be used to make requests. This is
        used to combine multiple deployments for model composition.

    Session
        - A Ray Train/Tune session: Tune session at the experiment execution layer
          and Train session at the Data Parallel training layer
          if running data-parallel distributed training with Ray Train.

          The session allows access to metadata, such as which trial is being run,
          information about the total number of workers, as well as the rank of the
          current worker. The session is also the interface through which an individual
          Trainable can interact with the Tune experiment as a whole. This includes uses
          such as reporting an individual trial’s metrics, saving/loading checkpoints,
          and retrieving the corresponding dataset shards for each Train worker.

        - A Ray cluster: in some cases the session also means a :term:`Ray Cluster`.
          For example, logs of a Ray cluster are stored under ``session_xxx/logs/``.

    Spillback
        A task caller schedules a task by first sending a resource request to the
        preferred raylet for that request. If the preferred raylet chooses not to grant
        the resources locally, it may also “Spillback” and respond to the caller with
        the address of a remote raylet at which the caller should retry the resource
        request.

    State
        State of the environment an RL agent interacts with.

    Synchronous execution
        Two tasks A and B are executed synchronously if A must finish before B can
        start. For example, if you call ``ray.get`` immediately after launching a remote
        task with ``task.remote()``, you’ll be running with synchronous execution,
        since this will wait for the task to finish before the program continues.

    Synchronous sampling
        Sampling workers work in synchronous steps. All of them must finish collecting
        a new batch of samples before training can proceed to the next iteration.

    Task
        A remote function invocation. This is a single function invocation that
        executes on a process different from the caller, and potentially on a different
        machine. A task can be stateless (a ``@ray.remote`` function) or stateful (a
        method of a ``@ray.remote`` class - see Actor below). A task is executed
        asynchronously with the caller: the ``.remote()`` call immediately returns
        one or more ``ObjectRefs`` (futures) that can be used to retrieve the
        return value(s). See :term:`Actor task`.

    Trainable
        A :ref:`Trainable<trainable-docs>` is the interface that Ray Tune uses to
        perform custom training
        logic. User-defined Trainables take in a configuration as an input and can
        run user-defined training code as well as custom metric reporting and
        checkpointing.

        There are many types of trainables. Most commonly used is the function
        trainable API, which is simply a Python function that contains model training
        logic and metric reporting. Tune also exposes a class trainable API, which
        allows you to implement training, checkpointing, and restoring as different
        methods.

        Ray Tune associates each trial with its own Trainable – the Trainable is the
        one actually doing training. The Trainable is a remote actor that can be placed
        on any node in a Ray cluster.

    Trainer
        A Trainer is the top-level API to configure a single distributed training job.
        :ref:`There are built-in Trainers for different frameworks<train-api>`,
        like PyTorch, Tensorflow, and XGBoost. Each trainer shares a common interface
        and otherwise defines framework-specific configurations and entrypoints. The
        main job of a trainer is to coordinate N distributed training workers and set
        up the communication backends necessary for these workers to communicate
        (e.g., for sharing computed gradients).

    Trainer configuration
        A Trainer can be configured in various ways. Some
        configurations are shared across all trainers, like the RunConfig, which
        configures things like the experiment storage, and ScalingConfig, which
        configures the number of training workers as well as resources needed per
        worker. Other configurations are specific to the trainer framework.

    Training iteration
        A partial training pass of input data up to pre-defined yield point
        (e.g., time or data consumed) for checkpointing of long running training jobs.
        A full training epoch can consist of multiple training iterations.
        .. TODO: RLlib

    Training epoch
        A full training pass of the input dataset. Typically, model training iterates
        through the full dataset in batches of size B, where gradients are calculated
        on each batch and then applied as an update to the model weights. Training
        jobs can consist of multiple epochs by training through the same dataset
        multiple times.

    Training step
        An RLlib-specific method of the Algorithm class which includes the core logic
        of an RL algorithm. Commonly includes gathering of experiences (either through
        sampling or from offline data), optimization steps, redistribution of learnt
        model weights. The particularities of this method are specific to algorithms
        and configurations.

    Transition
        A tuple of (observation, action, reward, next observation). A transition
        represents one step of an agent in an environment.

    Trial
        One training run within a Ray Tune experiment. If you run multiple trials,
        each trial usually corresponds to a different config (a set of hyperparameters).

    Trial scheduler
        When running a Ray Tune job, the scheduler will decide how to allocate
        resources to trials. In the most common case, this resource is time - the trial
        scheduler decides which trials to run at what time. Certain built-in schedulers
        like Asynchronous Hyperband (ASHA) perform early stopping of under-performing
        trials, while others like Population Based Training (PBT) will make
        under-performing trials copy the hyperparameter config and model weights of
        top performing trials and continue training.

    Tuner
        The Tuner is the top level Ray Tune API used to configure and run an
        experiment with many trials.

    .. TODO: Tunable

    .. TODO: (Ray) Workflow

    .. TODO: WorkerGroup

    .. TODO: Worker heap

    .. TODO: Worker node / worker node pod

    Worker process / worker
        The process that runs user defined tasks and actors.