logo

Ray 3.0.0.dev0

Overview

  • Getting Started Guide
  • Installing Ray
  • Ecosystem

Ray AI Runtime

  • What is Ray AI Runtime (AIR)?
  • Key Concepts
  • User Guides
    • Using Preprocessors
    • Using Trainers
    • Configuring Training Datasets
    • Configuring Hyperparameter Tuning
    • Using Predictors for Inference
    • Deploying Predictors with Serve
    • How to Deploy AIR
  • Examples
    • Training a Torch Classifier
    • Convert existing PyTorch code to Ray AIR
    • Tabular data training and serving with Keras and Ray AIR
    • Fine-tune a 🤗 Transformers model
    • Training a model with Sklearn
    • Training a model with distributed XGBoost
    • Hyperparameter tuning with XGBoostTrainer
    • Training a model with distributed LightGBM
    • Incremental Learning with Ray AIR
    • Serving reinforcement learning policy models
    • Online reinforcement learning with Ray AIR
    • Offline reinforcement learning with Ray AIR
    • Logging results and uploading models to Comet ML
    • Logging results and uploading models to Weights & Biases
    • Integrate Ray AIR with Feast feature store
  • Ray AIR API
  • Benchmarks

Ray Libraries

  • Ray Data
    • Getting Started
    • Key Concepts
    • User Guides
      • Creating Datasets
      • Transforming Datasets
      • Consuming Datasets
      • ML Preprocessing
      • ML Tensor Support
      • Pipelining Compute
      • Scheduling, Execution, and Memory Management
      • Performance Tips and Tuning
    • Examples
      • Processing the NYC taxi dataset
      • Large-scale ML Ingest
      • Scaling OCR with Ray Datasets
      • Advanced Pipeline Examples
      • Random Data Access (Experimental)
    • FAQ
    • Ray Datasets API
    • Integrations
      • Using Dask on Ray
      • Using Spark on Ray (RayDP)
      • Using Mars on Ray
      • Using Pandas on Ray (Modin)
  • Ray Train
    • Getting Started
    • Key Concepts
    • User Guides
      • Configurations User Guide
      • Deep Learning User Guide
      • XGBoost / LightGBM User Guide
      • Ray Train Architecture
    • Ray Train FAQ
    • Ray Train Examples
    • Ray Train API
  • Ray Tune
    • Getting Started
    • Key Concepts
    • User Guides
      • How Tune Works
      • How to Stop and Resume
      • Using Callbacks and Metrics
      • Distributed Tuning
      • Logging Tune Runs
      • Managing Resources
      • Working with Checkpoints
      • Using Search Spaces
      • Understanding PBT
      • Scalability Benchmarks
    • Examples
      • Scikit-Learn Example
      • Keras Example
      • PyTorch Example
      • PyTorch Lightning Example
      • MXNet Example
      • Ray Serve Example
      • Ray RLlib Example
      • XGBoost Example
      • LightGBM Example
      • Horovod Example
      • Huggingface Example
      • Comet Example
      • Weights & Biases Example
      • Ax Example
      • Dragonfly Example
      • Skopt Example
      • HyperOpt Example
      • Bayesopt Example
      • FLAML Example
      • BOHB Example
      • Nevergrad Example
      • Optuna Example
      • ZOOpt Example
      • SigOpt Example
      • HEBO Example
    • Ray Tune FAQ
    • Ray Tune API
      • Execution (Tuner, tune.Experiment)
      • Training (tune.Trainable, session.report)
      • Search Space API
      • Search Algorithms (tune.search)
      • Trial Schedulers (tune.schedulers)
      • Stopping mechanisms (tune.stopper)
      • ResultGrid (tune.ResultGrid)
      • Console Output (Reporters)
      • Syncing
      • Loggers (tune.logger)
      • Environment variables
      • Scikit-Learn API (tune.sklearn)
      • External library integrations (tune.integration)
      • Tune Internals
      • Tune Client API
      • Tune CLI (Experimental)
  • Ray Serve
    • Getting Started
    • Key Concepts
    • User Guides
      • Managing Deployments
      • Handling Dependencies
      • HTTP with Serve
      • ServeHandle: Calling Deployments from Python
      • Serving ML Models
      • Model Composition
      • Deploying Ray Serve
      • Monitoring Ray Serve
      • Performance Tuning
      • Serve Autoscaling
      • Putting Ray Serve Deployment Graphs in Production
      • Serve 1.x to 2.x API Migration Guide
    • Serve Architecture
    • Examples
      • Serving ML Models (Tensorflow, PyTorch, Scikit-Learn, others)
      • Batching Tutorial
      • Serving RLlib Models
      • Building a Gradio demo with Ray Serve
      • Deployment Graph Patterns
        • Pattern: Linear Pipeline
        • Pattern: Branching Input
        • Pattern: Conditional
    • Ray Serve FAQ
    • Ray Serve API
      • Serve CLI
      • Serve REST API
  • Ray RLlib
    • Key Concepts
    • Training APIs
    • Environments
    • Algorithms
    • User Guides
      • Models, Preprocessors, and Action Distributions
      • How To Customize Policies
      • Sample Collections and Trajectory Views
      • Replay Buffers
      • Working With Offline Data
      • How To Contribute to RLlib
    • Examples
    • Ray RLlib API
      • Environments
        • BaseEnv API
        • MultiAgentEnv API
        • VectorEnv API
        • ExternalEnv API
      • Algorithm API
      • Policies
        • Base Policy class (ray.rllib.policy.policy.Policy)
        • TensorFlow-Specific Sub-Classes
        • Torch-Specific Policy: TorchPolicy
        • Building Custom Policy Classes
      • Model APIs
      • Evaluation and Environment Rollout
        • RolloutWorker
        • Sample Batches
        • WorkerSet
        • Environment Samplers
        • PolicyMap (ray.rllib.policy.policy_map.PolicyMap)
      • Offline RL
      • Parallel Requests Utilities
      • Training Operations Utilities
      • ReplayBuffer API
      • RLlib Utilities
        • Exploration API
        • Schedules API
        • RLlib Annotations/Decorators
        • Deep Learning Framework (tf vs torch) Utilities
        • TensorFlow Utility Functions
        • PyTorch Utility Functions
        • Numpy Utility Functions
        • Deprecation Tools/Utils
  • More Libraries
    • Distributed Scikit-learn / Joblib
    • Distributed multiprocessing.Pool
    • Ray Collective Communication Lib
    • Distributed PyTorch Lightning Training on Ray
    • Using Ray with Pytorch Lightning
    • XGBoost-Ray with Dask
    • XGBoost-Ray with Modin
    • Ray Workflows
      • Workflow Basics
      • Workflow Management
      • Workflow Metadata
      • Events
      • API Comparisons
      • Advanced Topics
      • Ray Workflow API

Ray Core

  • Getting Started
  • Key Concepts
  • User Guides
    • Tasks
      • Specifying Required Resources
      • GPU Support
      • Nested Remote Functions
      • Fault Tolerance
      • Scheduling
      • Task Design Patterns
        • Pattern: Tree of tasks
        • Pattern: Map and reduce
        • Pattern: Using ray.wait to limit the number of in-flight tasks
        • Pattern: Using generators to reduce heap memory usage
        • Antipattern: Closure capture of large / unserializable object
        • Antipattern: Too fine-grained tasks
        • Antipattern: Accessing Global Variable in Tasks/Actors
        • Antipattern: Calling ray.get in a loop
        • Antipattern: Processing results in submission order using ray.get
        • Antipattern: Fetching too many results at once with ray.get
        • Antipattern: Redefining task or actor in loop
        • Antipattern: Unnecessary call of ray.get in a task
    • Actors
      • Named Actors
      • Terminating Actors
      • AsyncIO / Concurrency for Actors
      • Limiting Concurrency Per-Method with Concurrency Groups
      • Utility Classes
      • Fault Tolerance
      • Scheduling
      • Out-of-band Communication
      • Actor Task Execution Orders
      • Actor Design Patterns
        • Pattern: Tree of actors
        • Pattern: Multi-node synchronization using an Actor
        • Pattern: Concurrent operations with async actor
        • Pattern: Overlapping computation and communication
        • Pattern: Fault Tolerance with Actor Checkpointing
    • Objects
      • Serialization
      • Memory Management
      • Object Spilling
      • Fault Tolerance
    • Placement Groups
    • Environment Dependencies
    • Advanced Topics
      • Tips for first-time users
      • Ray Gotchas
      • Starting Ray
      • Debugging and Profiling
      • Using Namespaces
      • Cross-Language Programming
      • Working with Jupyter Notebooks & JupyterLab
      • Building Computation Graphs with Ray DAG API
      • Miscellaneous Topics
  • Examples
    • Asynchronous Advantage Actor Critic (A3C)
    • Fault-Tolerant Fairseq Training
    • Simple Parallel Model Selection
    • Parameter Server
    • Learning to Play Pong
    • Using Ray for Highly Parallelizable Tasks
  • Ray Core API

Ray Clusters

  • Ray Clusters Overview
  • Ray Clusters Quick Start
  • Key Concepts
  • Deployment Guide
    • Cluster Deployment Guide
    • Ray Job Submission
    • Ray Client: Interactive Development
  • Launching Cloud Clusters
    • AWS Configurations
  • Ray with Cluster Managers
    • The legacy Ray Kubernetes Operator
    • Deploying on YARN
    • Deploying on Slurm
    • Deploying on LSF
  • Ray Cluster API
    • Ray Cluster Config YAML and CLI
      • Cluster YAML Configuration Options
      • Cluster Launcher Commands
      • Autoscaler SDK
    • Ray Job Submission API
  • Usage Stats Collection
  • Ray Clusters (under construction)
    • Getting Started
    • Key Concepts
    • Deploying a Ray Cluster on Kubernetes
      • Getting Started
      • User Guides
        • Managed Kubernetes services
        • RayCluster Configuration
        • Autoscaling
        • Logging
        • Using GPUs
        • KubeRay vs. the Legacy Ray Operator
      • Examples
        • Ray AIR XGBoostTrainer on Kubernetes
      • API Reference
    • Deploying a Ray Cluster on VMs
      • Ray Clusters Quick Start
      • User Guides
        • Launching Clusters
        • Monitoring and observability
        • Best practices for deploying large clusters
        • Configuring Autoscaling
        • Community Supported Cluster Managers
      • Examples
        • Learn Ray Cluster basics
        • Setting up a Ray cluster for development
        • Setting up a Ray cluster for production
      • API References
        • Cluster Launcher Commands
        • Cluster YAML Configuration Options
    • Running Applications on Ray Clusters
      • Job Submission
        • Ray Job Submission Overview
        • CLI
        • Python SDK
        • REST API
        • Ray Client: Interactive Development
      • Autoscaling
      • Monitoring and Observability
    • References
      • Autoscaler SDK
      • Ray Job Submission API

References

  • API References
    • Ray AIR API
    • Ray Datasets API
    • Ray Train API
    • Ray Tune API
      • Execution (Tuner, tune.Experiment)
      • Training (tune.Trainable, session.report)
      • Search Space API
      • Search Algorithms (tune.search)
      • Trial Schedulers (tune.schedulers)
      • Stopping mechanisms (tune.stopper)
      • ResultGrid (tune.ResultGrid)
      • Console Output (Reporters)
      • Syncing
      • Loggers (tune.logger)
      • Environment variables
      • Scikit-Learn API (tune.sklearn)
      • External library integrations (tune.integration)
      • Tune Internals
      • Tune Client API
      • Tune CLI (Experimental)
    • Ray Serve API
      • Serve CLI
      • Serve REST API
    • Ray RLlib API
      • Environments
        • BaseEnv API
        • MultiAgentEnv API
        • VectorEnv API
        • ExternalEnv API
      • Algorithm API
      • Policies
        • Base Policy class (ray.rllib.policy.policy.Policy)
        • TensorFlow-Specific Sub-Classes
        • Torch-Specific Policy: TorchPolicy
        • Building Custom Policy Classes
      • Model APIs
      • Evaluation and Environment Rollout
        • RolloutWorker
        • Sample Batches
        • WorkerSet
        • Environment Samplers
        • PolicyMap (ray.rllib.policy.policy_map.PolicyMap)
      • Offline RL
      • Parallel Requests Utilities
      • Training Operations Utilities
      • ReplayBuffer API
      • RLlib Utilities
        • Exploration API
        • Schedules API
        • RLlib Annotations/Decorators
        • Deep Learning Framework (tf vs torch) Utilities
        • TensorFlow Utility Functions
        • PyTorch Utility Functions
        • Numpy Utility Functions
        • Deprecation Tools/Utils
    • Ray Workflow API
    • Ray Core API
    • Ray Cluster Config YAML and CLI
      • Cluster YAML Configuration Options
      • Cluster Launcher Commands
      • Autoscaler SDK
    • Ray Job Submission API
    • Usage Stats Data API
    • Ray State API

Developer Guides

  • API stability
  • Getting Involved / Contributing
    • Building Ray from Source
    • Contributing to the Ray Documentation
    • Testing Autoscaling Locally
    • Tips for testing Ray programs
  • Configuring Ray
  • Observability
    • Ray Dashboard
    • Monitoring Ray States
    • Ray Debugger
    • Logging
    • Exporting Metrics
    • Tracing
    • Debugging (internal)
    • Profiling (internal)
  • Architecture Whitepaper
Theme by the Executable Book Project

Advanced Topics¶

This section covers extended topics on how to use Ray.

  • Tips for first-time users
    • Tip 1: Delay ray.get()
    • Tip 2: Avoid tiny tasks
    • Tip 3: Avoid passing same object repeatedly to remote tasks
    • Tip 4: Pipeline data processing
  • Ray Gotchas
    • Environment variables are not passed from the driver to workers
    • Filenames work sometimes and not at other times
    • Placement groups are not composable
  • Starting Ray
    • What is the Ray runtime?
    • Starting Ray on a single machine
    • Starting Ray via the CLI (ray start)
    • Launching a Ray cluster (ray up)
    • What’s next?
  • Debugging and Profiling
    • Observing Ray Work
    • Visualizing Tasks in the Ray Timeline
    • Profiling Using Python’s CProfile
      • Profiling Ray Actors with cProfile
    • Understanding ObjectLostErrors
    • Crashes
    • No Speedup
    • Outdated Function Definitions
  • Using Namespaces
    • Specifying namespace for named actors
    • Anonymous namespaces
    • Getting the current namespace
  • Cross-Language Programming
    • Setup the driver
    • Python calling Java
    • Java calling Python
    • Cross-language data serialization
    • Cross-language exception stacks
  • Working with Jupyter Notebooks & JupyterLab
    • Setting Up Notebook
  • Building Computation Graphs with Ray DAG API
    • Ray DAG with functions
    • Ray DAG with classes and class methods
    • Ray DAG with custom InputNode
    • More Resources
  • Miscellaneous Topics
    • Dynamic Remote Parameters
    • Accelerator Types
    • Overloaded Functions
    • Inspecting Cluster State
      • Node Information
      • Resource Information

previous

Environment Dependencies

next

Tips for first-time users

By The Ray Team
© Copyright 2022, The Ray Team.