Skip to main content
Ctrl+K
$100 to try Ray on Anyscale — Start now.

Site Navigation

  • Get Started

  • Use Cases

  • Example Gallery

  • Library

    • Ray CoreScale general Python applications

    • Ray DataScale data ingest and preprocessing

    • Ray TrainScale machine learning training

    • Ray TuneScale hyperparameter tuning

    • Ray ServeScale model serving

    • Ray RLlibScale reinforcement learning

  • Docs

  • Resources

    • Discussion ForumGet your Ray questions answered

    • TrainingHands-on learning

    • BlogUpdates, best practices, user-stories

    • EventsWebinars, meetups, office hours

    • Success StoriesReal-world workload examples

    • EcosystemLibraries integrated with Ray

    • CommunityConnect with us

$100 to Try Ray

Site Navigation

  • Get Started

  • Use Cases

  • Example Gallery

  • Library

    • Ray CoreScale general Python applications

    • Ray DataScale data ingest and preprocessing

    • Ray TrainScale machine learning training

    • Ray TuneScale hyperparameter tuning

    • Ray ServeScale model serving

    • Ray RLlibScale reinforcement learning

  • Docs

  • Resources

    • Discussion ForumGet your Ray questions answered

    • TrainingHands-on learning

    • BlogUpdates, best practices, user-stories

    • EventsWebinars, meetups, office hours

    • Success StoriesReal-world workload examples

    • EcosystemLibraries integrated with Ray

    • CommunityConnect with us

$100 to Try Ray
  • Overview
  • Getting Started
  • Installation
  • Use Cases
    • Ray for ML Infrastructure
  • Example Gallery
  • Ecosystem
  • Ray Core
    • Key Concepts
    • User Guides
      • Tasks
        • Nested Remote Functions
        • Dynamic generators
      • Actors
        • Named Actors
        • Terminating Actors
        • AsyncIO / Concurrency for Actors
        • Limiting Concurrency Per-Method with Concurrency Groups
        • Utility Classes
        • Out-of-band Communication
        • Actor Task Execution Order
      • Objects
        • Serialization
        • Object Spilling
      • Environment Dependencies
      • Scheduling
        • Resources
        • Accelerator Support
        • Placement Groups
        • Memory Management
        • Out-Of-Memory Prevention
      • Fault tolerance
        • Task Fault Tolerance
        • Actor Fault Tolerance
        • Object Fault Tolerance
        • Node Fault Tolerance
        • GCS Fault Tolerance
      • Design Patterns & Anti-patterns
        • Pattern: Using nested tasks to achieve nested parallelism
        • Pattern: Using generators to reduce heap memory usage
        • Pattern: Using ray.wait to limit the number of pending tasks
        • Pattern: Using resources to limit the number of concurrently running tasks
        • Pattern: Using asyncio to run actor methods concurrently
        • Pattern: Using an actor to synchronize other tasks and actors
        • Pattern: Using a supervisor actor to manage a tree of actors
        • Pattern: Using pipelining to increase throughput
        • Anti-pattern: Returning ray.put() ObjectRefs from a task harms performance and fault tolerance
        • Anti-pattern: Calling ray.get in a loop harms parallelism
        • Anti-pattern: Calling ray.get unnecessarily harms performance
        • Anti-pattern: Processing results in submission order using ray.get increases runtime
        • Anti-pattern: Fetching too many objects at once with ray.get causes failure
        • Anti-pattern: Over-parallelizing with too fine-grained tasks harms speedup
        • Anti-pattern: Redefining the same remote function or class harms performance
        • Anti-pattern: Passing the same large argument by value repeatedly harms performance
        • Anti-pattern: Closure capturing large objects harms performance
        • Anti-pattern: Using global variables to share state between tasks and actors
        • Anti-pattern: Serialize ray.ObjectRef out of band
        • Anti-pattern: Forking new processes in application code
      • Ray Compiled Graph (beta)
        • Quickstart
        • Profiling
        • Experimental: Overlapping communication and computation
        • Troubleshooting
        • Compiled Graph API
      • Advanced topics
        • Tips for first-time users
        • Starting Ray
        • Ray Generators
        • Using Namespaces
        • Cross-language programming
        • Working with Jupyter Notebooks & JupyterLab
        • Lazy Computation Graphs with the Ray DAG API
        • Miscellaneous Topics
        • Authenticating Remote URIs in runtime_env
        • Lifetimes of a User-Spawn Process
    • Examples
      • Simple AutoML for time series with Ray Core
      • Batch Prediction with Ray Core
      • A Gentle Introduction to Ray Core by Example
      • Using Ray for Highly Parallelizable Tasks
      • A Simple MapReduce Example with Ray Core
      • Monte Carlo Estimation of π
      • Simple Parallel Model Selection
      • Parameter Server
      • Learning to Play Pong
      • Speed up your web crawler by parallelizing it with Ray
    • Ray Core API
      • Core API
      • Scheduling API
      • Runtime Env API
      • Utility
      • Exceptions
      • Ray Core CLI
      • State CLI
      • State API
  • Ray Data
    • Ray Data Quickstart
    • Key Concepts
    • User Guides
      • Loading Data
      • Inspecting Data
      • Transforming Data
      • Iterating over Data
      • Shuffling Data
      • Saving Data
      • Working with Images
      • Working with Text
      • Working with Tensors / NumPy
      • Working with PyTorch
      • Working with LLMs
      • Monitoring Your Workload
      • Execution Configurations
      • End-to-end: Offline Batch Inference
      • Advanced: Performance Tips and Tuning
      • Advanced: Read and Write Custom File Types
    • Examples
    • Ray Data API
      • Input/Output
      • Dataset API
      • DataIterator API
      • ExecutionOptions API
      • Aggregation API
      • GroupedData API
      • Global configuration
      • Preprocessor
      • Large Language Model (LLM) API
      • API Guide for Users from Other Data Libraries
    • Comparing Ray Data to other systems
    • Ray Data Internals
  • Ray Train
    • Overview
    • PyTorch Guide
    • PyTorch Lightning Guide
    • Hugging Face Transformers Guide
    • XGBoost Guide
    • More Frameworks
      • Hugging Face Accelerate Guide
      • DeepSpeed Guide
      • TensorFlow and Keras Guide
      • XGBoost and LightGBM Guide
      • Horovod Guide
    • User Guides
      • Data Loading and Preprocessing
      • Configuring Scale and GPUs
      • Configuring Persistent Storage
      • Monitoring and Logging Metrics
      • Saving and Loading Checkpoints
      • Experiment Tracking
      • Inspecting Training Results
      • Handling Failures and Node Preemption
      • Reproducibility
      • Hyperparameter Optimization
    • Examples
    • Benchmarks
    • Ray Train API
  • Ray Tune
    • Getting Started
    • Key Concepts
    • User Guides
      • Running Basic Experiments
      • Logging and Outputs in Tune
      • Setting Trial Resources
      • Using Search Spaces
      • How to Define Stopping Criteria for a Ray Tune Experiment
      • How to Save and Load Trial Checkpoints
      • How to Configure Persistent Storage in Ray Tune
      • How to Enable Fault Tolerance in Ray Tune
      • Using Callbacks and Metrics
      • Getting Data in and out of Tune
      • Analyzing Tune Experiment Results
      • A Guide to Population Based Training with Tune
        • Visualizing and Understanding PBT
      • Deploying Tune in the Cloud
      • Tune Architecture
      • Scalability Benchmarks
    • Ray Tune Examples
      • PyTorch Example
      • PyTorch Lightning Example
      • XGBoost Example
      • LightGBM Example
      • Hugging Face Transformers Example
      • Ray RLlib Example
      • Keras Example
      • Horovod Example
      • Weights & Biases Example
      • MLflow Example
      • Aim Example
      • Comet Example
      • Ax Example
      • HyperOpt Example
      • Bayesopt Example
      • BOHB Example
      • Nevergrad Example
      • Optuna Example
    • Ray Tune FAQ
    • Ray Tune API
      • Tune Execution (tune.Tuner)
      • Tune Experiment Results (tune.ResultGrid)
      • Training in Tune (tune.Trainable, tune.report)
      • Tune Search Space API
      • Tune Search Algorithms (tune.search)
      • Tune Trial Schedulers (tune.schedulers)
      • Tune Stopping Mechanisms (tune.stopper)
      • Tune Console Output (Reporters)
      • Syncing in Tune
      • Tune Loggers (tune.logger)
      • Tune Callbacks (tune.Callback)
      • Environment variables used by Ray Tune
      • External library integrations for Ray Tune
      • Tune Internals
      • Tune CLI (Experimental)
  • Ray Serve
    • Getting Started
    • Key Concepts
    • Develop and Deploy an ML Application
    • Deploy Compositions of Models
    • Deploy Multiple Applications
    • Model Multiplexing
    • Configure Ray Serve deployments
    • Set Up FastAPI and HTTP
    • Serving LLMs
    • Production Guide
      • Serve Config Files
      • Deploy on Kubernetes
      • Custom Docker Images
      • Add End-to-End Fault Tolerance
      • Handle Dependencies
      • Best practices in production
    • Monitor Your Application
    • Resource Allocation
    • Ray Serve Autoscaling
    • Advanced Guides
      • Pass Arguments to Applications
      • Advanced Ray Serve Autoscaling
      • Performance Tuning
      • Dynamic Request Batching
      • Updating Applications In-Place
      • Development Workflow
      • Set Up a gRPC Service
      • Experimental Java API
      • Deploy on VM
      • Run Multiple Applications in Different Containers
    • Architecture
    • Examples
    • Ray Serve API
  • Ray RLlib
    • Getting Started
    • Key concepts
    • Environments
      • Multi-Agent Environments
      • Hierarchical Environments
      • External Environments and Applications
    • AlgorithmConfig API
    • Algorithms
    • User Guides
      • Advanced Python APIs
      • Callbacks
      • Checkpointing
      • MetricsLogger API
      • Episodes
      • Replay Buffers
      • Working with offline data
      • RL Modules
      • Learner (Alpha)
      • Using RLlib with torch 2.x compile
      • Fault Tolerance And Elastic Training
      • Install RLlib for Development
      • RLlib scaling guide
    • Examples
    • New API stack migration guide
    • Ray RLlib API
      • Algorithm Configuration API
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.build_algo
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.build_learner_group
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.build_learner
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.is_multi_agent
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.is_offline
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.learner_class
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.model_config
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.rl_module_spec
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.total_train_batch_size
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_default_learner_class
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_default_rl_module_spec
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_evaluation_config_object
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_multi_rl_module_spec
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_multi_agent_setup
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_rollout_fragment_length
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.copy
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.validate
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.freeze
      • Algorithms
        • ray.rllib.algorithms.algorithm.Algorithm
        • ray.rllib.algorithms.algorithm.Algorithm.setup
        • ray.rllib.algorithms.algorithm.Algorithm.get_default_config
        • ray.rllib.algorithms.algorithm.Algorithm.env_runner
        • ray.rllib.algorithms.algorithm.Algorithm.eval_env_runner
        • ray.rllib.algorithms.algorithm.Algorithm.train
        • ray.rllib.algorithms.algorithm.Algorithm.training_step
        • ray.rllib.algorithms.algorithm.Algorithm.save_to_path
        • ray.rllib.algorithms.algorithm.Algorithm.restore_from_path
        • ray.rllib.algorithms.algorithm.Algorithm.from_checkpoint
        • ray.rllib.algorithms.algorithm.Algorithm.get_state
        • ray.rllib.algorithms.algorithm.Algorithm.set_state
        • ray.rllib.algorithms.algorithm.Algorithm.evaluate
        • ray.rllib.algorithms.algorithm.Algorithm.get_module
        • ray.rllib.algorithms.algorithm.Algorithm.add_policy
        • ray.rllib.algorithms.algorithm.Algorithm.remove_policy
      • Callback APIs
        • ray.rllib.callbacks.callbacks.RLlibCallback
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_algorithm_init
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_sample_end
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_train_result
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_evaluate_start
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_evaluate_end
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_env_runners_recreated
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_checkpoint_loaded
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_environment_created
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_episode_created
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_episode_start
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_episode_step
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_episode_end
      • Environments
        • EnvRunner API
        • SingleAgentEnvRunner API
        • SingleAgentEpisode API
        • MultiAgentEnv API
        • MultiAgentEnvRunner API
        • MultiAgentEpisode API
        • Env Utils
      • RLModule APIs
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec.build
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec.module_class
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec.observation_space
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec.action_space
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec.inference_only
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec.learner_only
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec.model_config
        • ray.rllib.core.rl_module.multi_rl_module.MultiRLModuleSpec
        • ray.rllib.core.rl_module.multi_rl_module.MultiRLModuleSpec.build
        • ray.rllib.core.rl_module.default_model_config.DefaultModelConfig
        • ray.rllib.core.rl_module.rl_module.RLModule
        • ray.rllib.core.rl_module.rl_module.RLModule.observation_space
        • ray.rllib.core.rl_module.rl_module.RLModule.action_space
        • ray.rllib.core.rl_module.rl_module.RLModule.inference_only
        • ray.rllib.core.rl_module.rl_module.RLModule.model_config
        • ray.rllib.core.rl_module.rl_module.RLModule.setup
        • ray.rllib.core.rl_module.rl_module.RLModule.as_multi_rl_module
        • ray.rllib.core.rl_module.rl_module.RLModule.forward_exploration
        • ray.rllib.core.rl_module.rl_module.RLModule.forward_inference
        • ray.rllib.core.rl_module.rl_module.RLModule.forward_train
        • ray.rllib.core.rl_module.rl_module.RLModule._forward
        • ray.rllib.core.rl_module.rl_module.RLModule._forward_exploration
        • ray.rllib.core.rl_module.rl_module.RLModule._forward_inference
        • ray.rllib.core.rl_module.rl_module.RLModule._forward_train
        • ray.rllib.core.rl_module.rl_module.RLModule.save_to_path
        • ray.rllib.core.rl_module.rl_module.RLModule.restore_from_path
        • ray.rllib.core.rl_module.rl_module.RLModule.from_checkpoint
        • ray.rllib.core.rl_module.rl_module.RLModule.get_state
        • ray.rllib.core.rl_module.rl_module.RLModule.set_state
        • ray.rllib.core.rl_module.multi_rl_module.MultiRLModule
        • ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.setup
        • ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.as_multi_rl_module
        • ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.add_module
        • ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.remove_module
        • ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.save_to_path
        • ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.restore_from_path
        • ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.from_checkpoint
        • ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.get_state
        • ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.set_state
      • Distribution API
        • ray.rllib.models.distributions.Distribution
        • ray.rllib.models.distributions.Distribution.from_logits
        • ray.rllib.models.distributions.Distribution.sample
        • ray.rllib.models.distributions.Distribution.rsample
        • ray.rllib.models.distributions.Distribution.logp
        • ray.rllib.models.distributions.Distribution.kl
      • LearnerGroup API
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.learners
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.build_learner_group
        • ray.rllib.core.learner.learner_group.LearnerGroup
      • Offline RL API
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.offline_data
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.learners
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.env_runners
        • ray.rllib.offline.offline_env_runner.OfflineSingleAgentEnvRunner
        • ray.rllib.offline.offline_data.OfflineData
        • ray.rllib.offline.offline_data.OfflineData.__init__
        • ray.rllib.offline.offline_data.OfflineData.sample
        • ray.rllib.offline.offline_data.OfflineData.default_map_batches_kwargs
        • ray.rllib.offline.offline_data.OfflineData.default_iter_batches_kwargs
        • ray.rllib.offline.offline_prelearner.OfflinePreLearner
        • ray.rllib.offline.offline_prelearner.OfflinePreLearner.__init__
        • ray.rllib.offline.offline_prelearner.SCHEMA
        • ray.rllib.offline.offline_prelearner.OfflinePreLearner.__call__
        • ray.rllib.offline.offline_prelearner.OfflinePreLearner._map_to_episodes
        • ray.rllib.offline.offline_prelearner.OfflinePreLearner._map_sample_batch_to_episode
        • ray.rllib.offline.offline_prelearner.OfflinePreLearner._should_module_be_updated
        • ray.rllib.offline.offline_prelearner.OfflinePreLearner.default_prelearner_buffer_class
        • ray.rllib.offline.offline_prelearner.OfflinePreLearner.default_prelearner_buffer_kwargs
      • Replay Buffer API
        • ray.rllib.utils.replay_buffers.replay_buffer.StorageUnit
        • ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer
        • ray.rllib.utils.replay_buffers.prioritized_replay_buffer.PrioritizedReplayBuffer
        • ray.rllib.utils.replay_buffers.reservoir_replay_buffer.ReservoirReplayBuffer
        • ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer.sample
        • ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer.add
        • ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer.get_state
        • ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer.set_state
        • ray.rllib.utils.replay_buffers.multi_agent_replay_buffer.MultiAgentReplayBuffer
        • ray.rllib.utils.replay_buffers.multi_agent_prioritized_replay_buffer.MultiAgentPrioritizedReplayBuffer
        • ray.rllib.utils.replay_buffers.utils.update_priorities_in_replay_buffer
        • ray.rllib.utils.replay_buffers.utils.sample_min_n_steps_from_buffer
      • RLlib Utilities
        • ray.rllib.utils.metrics.metrics_logger.MetricsLogger
        • ray.rllib.utils.metrics.metrics_logger.MetricsLogger.peek
        • ray.rllib.utils.metrics.metrics_logger.MetricsLogger.log_value
        • ray.rllib.utils.metrics.metrics_logger.MetricsLogger.log_dict
        • ray.rllib.utils.metrics.metrics_logger.MetricsLogger.merge_and_log_n_dicts
        • ray.rllib.utils.metrics.metrics_logger.MetricsLogger.log_time
        • ray.rllib.utils.schedules.scheduler.Scheduler
        • ray.rllib.utils.schedules.scheduler.Scheduler.validate
        • ray.rllib.utils.schedules.scheduler.Scheduler.get_current_value
        • ray.rllib.utils.schedules.scheduler.Scheduler.update
        • ray.rllib.utils.schedules.scheduler.Scheduler._create_tensor_variable
        • ray.rllib.utils.framework.try_import_torch
        • ray.rllib.utils.torch_utils.clip_gradients
        • ray.rllib.utils.torch_utils.compute_global_norm
        • ray.rllib.utils.torch_utils.convert_to_torch_tensor
        • ray.rllib.utils.torch_utils.explained_variance
        • ray.rllib.utils.torch_utils.flatten_inputs_to_1d_tensor
        • ray.rllib.utils.torch_utils.global_norm
        • ray.rllib.utils.torch_utils.one_hot
        • ray.rllib.utils.torch_utils.reduce_mean_ignore_inf
        • ray.rllib.utils.torch_utils.sequence_mask
        • ray.rllib.utils.torch_utils.set_torch_seed
        • ray.rllib.utils.torch_utils.softmax_cross_entropy_with_logits
        • ray.rllib.utils.torch_utils.update_target_network
        • ray.rllib.utils.numpy.aligned_array
        • ray.rllib.utils.numpy.concat_aligned
        • ray.rllib.utils.numpy.convert_to_numpy
        • ray.rllib.utils.numpy.fc
        • ray.rllib.utils.numpy.flatten_inputs_to_1d_tensor
        • ray.rllib.utils.numpy.make_action_immutable
        • ray.rllib.utils.numpy.huber_loss
        • ray.rllib.utils.numpy.l2_loss
        • ray.rllib.utils.numpy.lstm
        • ray.rllib.utils.numpy.one_hot
        • ray.rllib.utils.numpy.relu
        • ray.rllib.utils.numpy.sigmoid
        • ray.rllib.utils.numpy.softmax
        • ray.rllib.utils.checkpoints.try_import_msgpack
        • ray.rllib.utils.checkpoints.Checkpointable
  • More Libraries
    • Distributed Scikit-learn / Joblib
    • Distributed multiprocessing.Pool
    • Ray Collective Communication Lib
    • Using Dask on Ray
      • ray.util.dask.RayDaskCallback
        • ray.util.dask.RayDaskCallback.ray_active
      • ray.util.dask.callbacks.RayDaskCallback._ray_presubmit
      • ray.util.dask.callbacks.RayDaskCallback._ray_postsubmit
      • ray.util.dask.callbacks.RayDaskCallback._ray_pretask
      • ray.util.dask.callbacks.RayDaskCallback._ray_posttask
      • ray.util.dask.callbacks.RayDaskCallback._ray_postsubmit_all
      • ray.util.dask.callbacks.RayDaskCallback._ray_finish
    • Using Spark on Ray (RayDP)
    • Using Mars on Ray
    • Using Pandas on Ray (Modin)
    • Distributed Data Processing in Data-Juicer
    • Ray Workflows (Deprecated)
      • Key Concepts
      • Getting Started
      • Workflow Management
      • Workflow Metadata
      • Events
      • API Comparisons
      • Advanced Topics
      • Ray Workflows API
        • Workflow Execution API
        • Workflow Management API
  • Ray Clusters
    • Key Concepts
    • Deploying on Kubernetes
      • Getting Started with KubeRay
        • KubeRay Operator Installation
        • RayCluster Quickstart
        • RayJob Quickstart
        • RayService Quickstart
      • User Guides
        • Deploy Ray Serve Apps
        • RayService worker Pods aren’t ready
        • RayService high availability
        • KubeRay Observability
        • KubeRay upgrade guide
        • Managed Kubernetes services
        • Best Practices for Storage and Dependencies
        • RayCluster Configuration
        • KubeRay Autoscaling
        • GCS fault tolerance in KubeRay
        • Tuning Redis for a Persistent Fault Tolerant GCS
        • Configuring KubeRay to use Google Cloud Storage Buckets in GKE
        • Persist KubeRay custom resource logs
        • Persist KubeRay Operator Logs
        • Using GPUs
        • Use TPUs with KubeRay
        • Developing Ray Serve Python scripts on a RayCluster
        • Specify container commands for Ray head/worker Pods
        • Helm Chart RBAC
        • TLS Authentication
        • (Advanced) Understanding the Ray Autoscaler in the Context of Kubernetes
        • (Advanced) Deploying a static Ray cluster without KubeRay
        • Use kubectl plugin (beta)
        • Configure Ray clusters with authentication and access control using KubeRay
        • Reducing image pull latency on Kubernetes
      • Examples
        • Ray Train XGBoostTrainer on Kubernetes
        • Train PyTorch ResNet model with GPUs on Kubernetes
        • Train a PyTorch model on Fashion MNIST with CPUs on Kubernetes
        • Serve a StableDiffusion text-to-image model on Kubernetes
        • Serve a Stable Diffusion model on GKE with TPUs
        • Serve a MobileNet image classifier on Kubernetes
        • Serve a text summarizer on Kubernetes
        • RayJob Batch Inference Example
        • Priority Scheduling with RayJob and Kueue
        • Gang Scheduling with RayJob and Kueue
        • Distributed checkpointing with KubeRay and GCSFuse
        • Use Modin with Ray on Kubernetes
        • Serve a Large Language Model with vLLM on Kubernetes
      • KubeRay Ecosystem
        • Ingress
        • Using Prometheus and Grafana
        • Profiling with py-spy
        • KubeRay integration with Volcano
        • KubeRay integration with Apache YuniKorn
        • Gang scheduling and priority scheduling for RayJob with Kueue
        • mTLS and L7 observability with Istio
      • KubeRay Benchmarks
        • KubeRay memory and scalability benchmark
      • KubeRay Troubleshooting
        • Troubleshooting guide
        • RayService troubleshooting
      • API Reference
    • Deploying on VMs
      • Getting Started
      • User Guides
        • Launching Ray Clusters on AWS, GCP, Azure, vSphere, On-Prem
        • Best practices for deploying large clusters
        • Configuring Autoscaling
        • Log Persistence
        • Community Supported Cluster Managers
      • Examples
        • Ray Train XGBoostTrainer on VMs
      • API References
        • Cluster Launcher Commands
        • Cluster YAML Configuration Options
    • Collecting and monitoring metrics
    • Configuring and Managing Ray Dashboard
    • Applications Guide
      • Ray Jobs Overview
        • Quickstart using the Ray Jobs CLI
        • Python SDK Overview
        • Python SDK API Reference
        • Ray Jobs CLI API Reference
        • Ray Jobs REST API
        • Ray Client
      • Programmatic Cluster Scaling
    • FAQ
    • Ray Cluster Management API
      • Cluster Management CLI
      • Python SDK API Reference
      • Ray Jobs CLI API Reference
      • Programmatic Cluster Scaling
    • Usage Stats Collection
  • Monitoring and Debugging
    • Ray Dashboard
    • Ray Distributed Debugger
    • Key Concepts
    • User Guides
      • Debugging Applications
        • Common Issues
        • Debugging Memory Issues
        • Debugging Hangs
        • Debugging Failures
        • Optimizing Performance
        • Ray Distributed Debugger
        • Using the Ray Debugger
      • Monitoring with the CLI or SDK
      • Configuring Logging
      • Profiling
      • Adding Application-Level Metrics
      • Tracing
    • Reference
      • State API
      • State CLI
      • System Metrics
  • Developer Guides
    • API Stability
    • API Policy
    • Getting Involved / Contributing
      • Building Ray from Source
      • CI Testing Workflow on PRs
      • Contributing to the Ray Documentation
      • How to write code snippets
      • Testing Autoscaling Locally
      • Tips for testing Ray programs
      • Debugging for Ray Developers
      • Profiling for Ray Developers
    • Configuring Ray
    • Architecture Whitepapers
  • Glossary
  • Security
  • Ray Tune: Hyperparameter Tuning
  • Ray Tune API
  • Tune Search...

Tune Search Algorithms (tune.search)#

Tune’s Search Algorithms are wrappers around open-source optimization libraries for efficient hyperparameter selection. Each library has a specific way of defining the search space - please refer to their documentation for more details. Tune will automatically convert search spaces passed to Tuner to the library format in most cases.

You can utilize these search algorithms as follows:

from ray import tune
from ray.tune.search.optuna import OptunaSearch

def train_fn(config):
    # This objective function is just for demonstration purposes
    tune.report({"loss": config["param"]})

tuner = tune.Tuner(
    train_fn,
    tune_config=tune.TuneConfig(
        search_alg=OptunaSearch(),
        num_samples=100,
        metric="loss",
        mode="min",
    ),
    param_space={"param": tune.uniform(0, 1)},
)
results = tuner.fit()

Saving and Restoring Tune Search Algorithms#

Certain search algorithms have save/restore implemented, allowing reuse of searchers that are fitted on the results of multiple tuning runs.

search_alg = HyperOptSearch()

tuner_1 = tune.Tuner(
    train_fn,
    tune_config=tune.TuneConfig(search_alg=search_alg)
)
results_1 = tuner_1.fit()

search_alg.save("./my-checkpoint.pkl")

# Restore the saved state onto another search algorithm,
# in a new tuning script

search_alg2 = HyperOptSearch()
search_alg2.restore("./my-checkpoint.pkl")

tuner_2 = tune.Tuner(
    train_fn,
    tune_config=tune.TuneConfig(search_alg=search_alg2)
)
results_2 = tuner_2.fit()

Tune automatically saves searcher state inside the current experiment folder during tuning. See Result logdir: ... in the output logs for this location.

Note that if you have two Tune runs with the same experiment folder, the previous state checkpoint will be overwritten. You can avoid this by making sure RunConfig(name=...) is set to a unique identifier:

search_alg = HyperOptSearch()
tuner_1 = tune.Tuner(
    train_fn,
    tune_config=tune.TuneConfig(
        num_samples=5,
        search_alg=search_alg,
    ),
    run_config=tune.RunConfig(
        name="my-experiment-1",
        storage_path="~/my_results",
    )
)
results = tuner_1.fit()

search_alg2 = HyperOptSearch()
search_alg2.restore_from_dir(
  os.path.join("~/my_results", "my-experiment-1")
)

Random search and grid search (tune.search.basic_variant.BasicVariantGenerator)#

The default and most basic way to do hyperparameter search is via random and grid search. Ray Tune does this through the BasicVariantGenerator class that generates trial variants given a search space definition.

The BasicVariantGenerator is used per default if no search algorithm is passed to Tuner.

basic_variant.BasicVariantGenerator

Uses Tune's variant generation for resolving variables.

Ax (tune.search.ax.AxSearch)#

ax.AxSearch

Uses Ax to optimize hyperparameters.

Bayesian Optimization (tune.search.bayesopt.BayesOptSearch)#

bayesopt.BayesOptSearch

Uses bayesian-optimization/BayesianOptimization to optimize hyperparameters.

BOHB (tune.search.bohb.TuneBOHB)#

BOHB (Bayesian Optimization HyperBand) is an algorithm that both terminates bad trials and also uses Bayesian Optimization to improve the hyperparameter search. It is available from the HpBandSter library.

Importantly, BOHB is intended to be paired with a specific scheduler class: HyperBandForBOHB.

In order to use this search algorithm, you will need to install HpBandSter and ConfigSpace:

$ pip install hpbandster ConfigSpace

See the BOHB paper for more details.

bohb.TuneBOHB

BOHB suggestion component.

HEBO (tune.search.hebo.HEBOSearch)#

hebo.HEBOSearch

Uses HEBO (Heteroscedastic Evolutionary Bayesian Optimization) to optimize hyperparameters.

HyperOpt (tune.search.hyperopt.HyperOptSearch)#

hyperopt.HyperOptSearch

A wrapper around HyperOpt to provide trial suggestions.

Nevergrad (tune.search.nevergrad.NevergradSearch)#

nevergrad.NevergradSearch

Uses Nevergrad to optimize hyperparameters.

Optuna (tune.search.optuna.OptunaSearch)#

optuna.OptunaSearch

A wrapper around Optuna to provide trial suggestions.

ZOOpt (tune.search.zoopt.ZOOptSearch)#

zoopt.ZOOptSearch

A wrapper around ZOOpt to provide trial suggestions.

Repeated Evaluations (tune.search.Repeater)#

Use ray.tune.search.Repeater to average over multiple evaluations of the same hyperparameter configurations. This is useful in cases where the evaluated training procedure has high variance (i.e., in reinforcement learning).

By default, Repeater will take in a repeat parameter and a search_alg. The search_alg will suggest new configurations to try, and the Repeater will run repeat trials of the configuration. It will then average the search_alg.metric from the final results of each repeated trial.

Warning

It is recommended to not use Repeater with a TrialScheduler. Early termination can negatively affect the average reported metric.

Repeater

A wrapper algorithm for repeating trials of same parameters.

ConcurrencyLimiter (tune.search.ConcurrencyLimiter)#

Use ray.tune.search.ConcurrencyLimiter to limit the amount of concurrency when using a search algorithm. This is useful when a given optimization algorithm does not parallelize very well (like a naive Bayesian Optimization).

ConcurrencyLimiter

A wrapper algorithm for limiting the number of concurrent trials.

Custom Search Algorithms (tune.search.Searcher)#

If you are interested in implementing or contributing a new Search Algorithm, provide the following interface:

Searcher

Abstract class for wrapping suggesting algorithms.

Searcher.suggest

Queries the algorithm to retrieve the next set of parameters.

Searcher.save

Save state to path for this search algorithm.

Searcher.restore

Restore state for this search algorithm

Searcher.on_trial_result

Optional notification for result during training.

Searcher.on_trial_complete

Notification for the completion of trial.

If contributing, make sure to add test cases and an entry in the function described below.

Shim Instantiation (tune.create_searcher)#

There is also a shim function that constructs the search algorithm based on the provided string. This can be useful if the search algorithm you want to use changes often (e.g., specifying the search algorithm via a CLI option or config file).

create_searcher

Instantiate a search algorithm based on the given string.

previous

ray.tune.sample_from

next

ray.tune.search.basic_variant.BasicVariantGenerator

On this page
  • Saving and Restoring Tune Search Algorithms
  • Random search and grid search (tune.search.basic_variant.BasicVariantGenerator)
  • Ax (tune.search.ax.AxSearch)
  • Bayesian Optimization (tune.search.bayesopt.BayesOptSearch)
  • BOHB (tune.search.bohb.TuneBOHB)
  • HEBO (tune.search.hebo.HEBOSearch)
  • HyperOpt (tune.search.hyperopt.HyperOptSearch)
  • Nevergrad (tune.search.nevergrad.NevergradSearch)
  • Optuna (tune.search.optuna.OptunaSearch)
  • ZOOpt (tune.search.zoopt.ZOOptSearch)
  • Repeated Evaluations (tune.search.Repeater)
  • ConcurrencyLimiter (tune.search.ConcurrencyLimiter)
  • Custom Search Algorithms (tune.search.Searcher)
  • Shim Instantiation (tune.create_searcher)
Edit on GitHub
Thanks for the feedback!
Was this helpful?
Yes
No
Feedback
Submit

© Copyright 2025, The Ray Team.

Created using Sphinx 7.3.7.

Built with the PyData Sphinx Theme 0.14.1.