logo

Ray 3.0.0.dev0

Overview

  • Getting Started Guide
  • Installing Ray
  • Ecosystem

Ray AI Runtime

  • What is Ray AI Runtime (AIR)?
  • Key Concepts
  • User Guides
    • Preprocessing Data
    • Setting up Data Ingest
    • Analyzing results from hyperparameter tuning
    • Logging results and uploading models to Comet ML
    • Logging results and uploading models to Weights & Biases
    • Serve Ray AIR Predictors with ModelWrapper
    • Deployment Guide
    • Use a pretrained model for batch or online inference
  • Examples
    • Training a Torch Classifier
    • Convert existing PyTorch code to Ray AIR
    • Tabular data training and serving with Keras and Ray AIR
    • Fine-tune a 🤗 Transformers model
    • Training a model with Sklearn
    • Training a model with distributed XGBoost
    • Training a model with distributed LightGBM
    • Incremental Learning with Ray AIR
    • Serving reinforcement learning policy models
    • Online reinforcement learning with Ray AIR
    • Offline reinforcement learning with Ray AIR
    • Integrate Ray AIR with Feast feature store
  • Ray AIR API

AIR Libraries

  • Ray Data
    • Getting Started
    • Key Concepts
    • User Guides
      • Creating Datasets
      • Saving Datasets
      • Transforming Datasets
      • Accessing Datasets
      • Pipelining Compute
      • ML Preprocessing
      • Working with Tensors
      • Advanced Pipeline Usage
      • Random Data Access (Experimental)
      • Using Custom Datasources
      • Memory Management
      • Performance Tips and Tuning
    • Examples
      • Processing the NYC taxi dataset
      • Large-scale ML Ingest
    • FAQ
    • Ray Datasets API
    • Integrations
      • Using Dask on Ray
      • Using Spark on Ray (RayDP)
      • Using Mars on Ray
      • Using Pandas on Ray (Modin)
  • Ray Train
    • Ray Train User Guide
    • Ray Train Examples
    • Ray Train FAQ
    • Ray Train Architecture
    • Ray Train API
  • Ray Tune
    • Getting Started
    • Key Concepts
    • User Guides
      • How Tune Works
      • How to Stop and Resume
      • Using Callbacks and Metrics
      • Distributed Tuning
      • Logging Tune Runs
      • Managing Resources
      • Working with Checkpoints
      • Using Search Spaces
      • Understanding PBT
      • Scalability Benchmarks
    • Examples
      • Scikit-Learn Example
      • Keras Example
      • PyTorch Example
      • PyTorch Lightning Example
      • MXNet Example
      • Ray Serve Example
      • Ray RLlib Example
      • XGBoost Example
      • LightGBM Example
      • Horovod Example
      • Huggingface Example
      • Comet Example
      • Weights & Biases Example
      • Ax Example
      • Dragonfly Example
      • Skopt Example
      • HyperOpt Example
      • Bayesopt Example
      • FLAML Example
      • BOHB Example
      • Nevergrad Example
      • Optuna Example
      • ZOOpt Example
      • SigOpt Example
      • HEBO Example
    • Ray Tune FAQ
    • Ray Tune API
      • Execution (tune.run, tune.Experiment)
      • Training (tune.Trainable, session.report)
      • Search Space API
      • Search Algorithms (tune.search)
      • Trial Schedulers (tune.schedulers)
      • Stopping mechanisms (tune.stopper)
      • Analysis (tune.analysis)
      • Console Output (Reporters)
      • Loggers (tune.logger)
      • Environment variables
      • Scikit-Learn API (tune.sklearn)
      • External library integrations (tune.integration)
      • Tune Internals
      • Tune Client API
      • Tune CLI (Experimental)
  • Ray Serve
    • Getting Started
    • Key Concepts
    • User Guides
      • Managing Deployments
      • Handling Dependencies
      • Calling Deployments via HTTP
      • HTTP Adapters
      • ServeHandle: Calling Deployments from Python
      • Serving ML Models
      • Deploying Ray Serve
      • Debugging & Monitoring
      • Performance Tuning
      • Serve Autoscaling
      • Deployment Graph
        • Deployment Graph E2E Tutorial
        • Pattern: Chain nodes with same class and different args
        • Pattern: Combine two nodes with passing same input in parallel
        • Pattern: Control flow based on the user inputs
        • Pattern: Visualize DAG during development
        • Pattern: Http endpoint for dag graph
      • Putting Ray Serve Deployment Graphs in Production
    • Serve Architecture
    • Examples
      • Keras and Tensorflow Tutorial
      • PyTorch Tutorial
      • Scikit-Learn Tutorial
      • Batching Tutorial
      • Integration with Existing Web Servers
      • Serving RLlib Models
      • Building a Gradio demo with Ray Serve
    • Ray Serve FAQ
    • Ray Serve API
  • Ray RLlib
    • Key Concepts
    • Training APIs
    • Environments
    • Algorithms
    • User Guides
      • Models, Preprocessors, and Action Distributions
      • How To Customize Policies
      • Sample Collections and Trajectory Views
      • Replay Buffers
      • Working With Offline Data
      • How To Contribute to RLlib
    • Examples
    • Ray RLlib API
      • Environments
        • BaseEnv API
        • MultiAgentEnv API
        • VectorEnv API
        • ExternalEnv API
      • Algorithm API
      • Policies
        • Base Policy class (ray.rllib.policy.policy.Policy)
        • TensorFlow-Specific Sub-Classes
        • Torch-Specific Policy: TorchPolicy
        • Building Custom Policy Classes
      • Model APIs
      • Evaluation and Environment Rollout
        • RolloutWorker
        • Sample Batches
        • WorkerSet
        • Environment Samplers
        • PolicyMap (ray.rllib.policy.policy_map.PolicyMap)
      • Offline RL
      • Distributed Execution API
      • RLlib Utilities
        • Exploration API
        • Schedules API
        • RLlib Annotations/Decorators
        • Deep Learning Framework (tf vs torch) Utilities
        • TensorFlow Utility Functions
        • PyTorch Utility Functions
        • Numpy Utility Functions
        • Deprecation Tools/Utils
      • ReplayBuffer API
  • More Libraries
    • Distributed Scikit-learn / Joblib
    • Distributed LightGBM on Ray
    • Distributed multiprocessing.Pool
    • Ray Collective Communication Lib
    • Distributed PyTorch Lightning Training on Ray
    • Using Ray with Pytorch Lightning
    • Distributed XGBoost on Ray
    • XGBoost-Ray with Dask
    • XGBoost-Ray with Modin
    • Ray Workflows
      • Workflow Basics
      • Workflow Management
      • Workflow Metadata
      • Events
      • API Comparisons
      • Advanced Topics
      • Ray Workflows API

Ray Core

  • Getting Started
  • Key Concepts
  • User Guides
    • Tasks
      • Specifying Required Resources
      • GPU Support
      • Nested Remote Functions
      • Fault Tolerance
      • Scheduling
      • Task Design Patterns
        • Pattern: Tree of tasks
        • Pattern: Map and reduce
        • Pattern: Using ray.wait to limit the number of in-flight tasks
        • Pattern: Using generators to reduce heap memory usage
        • Antipattern: Closure capture of large / unserializable object
        • Antipattern: Too fine-grained tasks
        • Antipattern: Accessing Global Variable in Tasks/Actors
        • Antipattern: Calling ray.get in a loop
        • Antipattern: Processing results in submission order using ray.get
        • Antipattern: Fetching too many results at once with ray.get
        • Antipattern: Redefining task or actor in loop
        • Antipattern: Unnecessary call of ray.get in a task
    • Actors
      • Named Actors
      • Terminating Actors
      • AsyncIO / Concurrency for Actors
      • Limiting Concurrency Per-Method with Concurrency Groups
      • Utility Classes
      • Fault Tolerance
      • Scheduling
      • Out-of-band Communication
      • Actor Task Execution Orders
      • Actor Design Patterns
        • Pattern: Tree of actors
        • Pattern: Multi-node synchronization using an Actor
        • Pattern: Concurrent operations with async actor
        • Pattern: Overlapping computation and communication
        • Pattern: Fault Tolerance with Actor Checkpointing
    • Objects
      • Serialization
      • Memory Management
      • Object Spilling
      • Fault Tolerance
    • Placement Groups
    • Environment Dependencies
    • More Topics
      • Tips for first-time users
      • Starting Ray
      • Debugging and Profiling
      • Using Namespaces
      • Cross-Language Programming
      • Working with Jupyter Notebooks & JupyterLab
      • Building Computation Graphs with Ray DAG API
      • Miscellaneous Topics
  • Examples
    • Asynchronous Advantage Actor Critic (A3C)
    • Fault-Tolerant Fairseq Training
    • Simple Parallel Model Selection
    • Parameter Server
    • Learning to Play Pong
    • Using Ray for Highly Parallelizable Tasks
  • Ray Core API

Ray Clusters

  • Ray Clusters Overview
  • Ray Clusters Quick Start
  • Key Concepts
  • Deployment Guide
    • Cluster Deployment Guide
    • Ray Job Submission
    • Ray Client: Interactive Development
  • Launching Cloud Clusters
    • AWS Configurations
  • Ray with Cluster Managers
    • Deploying on Kubernetes
    • Deploying with KubeRay
    • Deploying on YARN
    • Deploying on Slurm
    • Deploying on LSF
  • Ray Cluster API
    • Ray Cluster Config YAML and CLI
      • Cluster YAML Configuration Options
      • Cluster Launcher Commands
      • Autoscaler SDK
    • Ray Job Submission API
  • Usage Stats Collection

References

  • API References
    • Ray Datasets API
    • Ray Train API
    • Ray Tune API
      • Execution (tune.run, tune.Experiment)
      • Training (tune.Trainable, session.report)
      • Search Space API
      • Search Algorithms (tune.search)
      • Trial Schedulers (tune.schedulers)
      • Stopping mechanisms (tune.stopper)
      • Analysis (tune.analysis)
      • Console Output (Reporters)
      • Loggers (tune.logger)
      • Environment variables
      • Scikit-Learn API (tune.sklearn)
      • External library integrations (tune.integration)
      • Tune Internals
      • Tune Client API
      • Tune CLI (Experimental)
    • Ray Serve API
    • Ray RLlib API
      • Environments
        • BaseEnv API
        • MultiAgentEnv API
        • VectorEnv API
        • ExternalEnv API
      • Algorithm API
      • Policies
        • Base Policy class (ray.rllib.policy.policy.Policy)
        • TensorFlow-Specific Sub-Classes
        • Torch-Specific Policy: TorchPolicy
        • Building Custom Policy Classes
      • Model APIs
      • Evaluation and Environment Rollout
        • RolloutWorker
        • Sample Batches
        • WorkerSet
        • Environment Samplers
        • PolicyMap (ray.rllib.policy.policy_map.PolicyMap)
      • Offline RL
      • Distributed Execution API
      • RLlib Utilities
        • Exploration API
        • Schedules API
        • RLlib Annotations/Decorators
        • Deep Learning Framework (tf vs torch) Utilities
        • TensorFlow Utility Functions
        • PyTorch Utility Functions
        • Numpy Utility Functions
        • Deprecation Tools/Utils
      • ReplayBuffer API
    • Ray AIR API
    • Ray Workflows API
    • Ray Core API
    • Ray Cluster Config YAML and CLI
      • Cluster YAML Configuration Options
      • Cluster Launcher Commands
      • Autoscaler SDK
    • Ray Job Submission API
    • Usage Stats Data API

Developer Guides

  • API stability
  • Getting Involved / Contributing
    • Building Ray from Source
    • Contributing to the Ray Documentation
    • Testing Autoscaling Locally
    • Tips for testing Ray programs
  • Configuring Ray
  • Observability
    • Ray Dashboard
    • Ray Debugger
    • Logging
    • Exporting Metrics
    • Tracing
    • Debugging (internal)
    • Profiling (internal)
  • Architecture Whitepaper
Theme by the Executable Book Project

Ray with Cluster Managers¶

Note

If you’re using AWS, Azure or GCP you can use the Ray Cluster Launcher to simplify the cluster setup process.

  • Deploying on Kubernetes
    • Overview
    • Quick Guide
    • The Ray Kubernetes Operator
    • Installing the Ray Operator with Helm
    • Observability
    • Running Ray programs with Ray Jobs Submission
    • Running Ray programs with Ray Client
    • Cleanup
    • Next steps
    • Questions or Issues?
  • Deploying with KubeRay
    • Using the autoscaler
    • Uninstalling the KubeRay operator
    • Further details on Ray autoscaler support.
    • Developing the KubeRay integration (advanced)
  • Deploying on YARN
    • Skein Configuration
    • Packaging Dependencies
    • Ray Setup in YARN
    • Running a Job
    • Cleaning Up
    • Questions or Issues?
  • Deploying on Slurm
    • Walkthrough using Ray with SLURM
    • Python-interface SLURM scripts
    • Examples and templates
  • Deploying on LSF

previous

AWS Configurations

next

Deploying on Kubernetes

By The Ray Team
© Copyright 2022, The Ray Team.