logo

Ray 1.12.1

Overview

  • Getting Started Guide
  • Installing Ray
  • Ecosystem

Ray ML

  • Ray Data
    • Getting Started
    • Key Concepts
    • User Guides
      • Creating Datasets
      • Saving Datasets
      • Transforming Datasets
      • Exchanging Datasets
      • Pipelining Compute
      • ML Preprocessing
      • Working with Tensors
      • Advanced Pipeline Usage
      • Random Data Access (Experimental)
      • Using Custom Datasources
      • Performance Tips and Tuning
    • Example: Large-scale ML Ingest
    • Ray Datasets API
    • Integrations
      • Using Dask on Ray
      • Using Spark on Ray (RayDP)
      • Using Mars on Ray
      • Using Pandas on Ray (Modin)
  • Ray Train
    • Ray Train User Guide
    • Ray Train Examples
    • Ray Train Architecture
    • Migrating from Ray SGD to Ray Train
    • RaySGD v1: Distributed Training Wrappers
      • Distributed PyTorch
      • Distributed TensorFlow
      • RaySGD Hyperparameter Tuning
      • RaySGD API Reference
    • Ray Train API
  • Ray Tune
    • Getting Started
    • Key Concepts
    • User Guides
      • How Tune Works
      • How to Stop and Resume
      • Using Callbacks and Metrics
      • Distributed Tuning
      • Logging Tune Runs
      • Managing Resources
      • Working with Checkpoints
      • Using Search Spaces
      • Understanding PBT
      • Scalability Benchmarks
    • Examples
    • Ray Tune FAQ
    • Ray Tune API
      • Execution (tune.run, tune.Experiment)
      • Training (tune.Trainable, tune.report)
      • Search Space API
      • Search Algorithms (tune.suggest)
      • Trial Schedulers (tune.schedulers)
      • Stopping mechanisms (tune.stopper)
      • Analysis (tune.analysis)
      • Console Output (Reporters)
      • Loggers (tune.logger)
      • Environment variables
      • Scikit-Learn API (tune.sklearn)
      • External library integrations (tune.integration)
      • Tune Internals
      • Tune Client API
      • Tune CLI (Experimental)
  • Ray Serve
    • End-to-End Tutorial
    • Core API: Deployments
    • Calling Deployments via HTTP and Python
    • Deploying Ray Serve
    • Serving ML Models
    • Deployment Graph
    • Performance Tuning
    • Serve Architecture
    • Advanced Tutorials
      • Keras and Tensorflow Tutorial
      • PyTorch Tutorial
      • Scikit-Learn Tutorial
      • Batching Tutorial
      • Integration with Existing Web Servers
      • Serving RLlib Models
    • Ray Serve FAQ
    • Ray Serve API
  • Ray RLlib
    • Key Concepts
    • Training APIs
    • Environments
    • Algorithms
    • User Guides
      • Models, Preprocessors, and Action Distributions
      • How To Customize Policies
      • Sample Collections and Trajectory Views
      • Working With Offline Data
      • How To Contribute to RLlib
    • Examples
    • Ray RLlib API
      • Environments
        • BaseEnv API
        • MultiAgentEnv API
        • VectorEnv API
        • ExternalEnv API
      • Trainer API
      • Policies
        • Base Policy class (ray.rllib.policy.policy.Policy)
        • TensorFlow-Specific Sub-Classes
        • Torch-Specific Policy: TorchPolicy
        • Building Custom Policy Classes
      • Model APIs
      • Evaluation and Environment Rollout
        • RolloutWorker
        • Sample Batches
        • WorkerSet
        • Environment Samplers
        • PolicyMap (ray.rllib.policy.policy_map.PolicyMap)
      • Offline RL
      • Distributed Execution API
      • RLlib Utilities
        • Exploration API
        • Schedules API
        • RLlib Annotations/Decorators
        • Deep Learning Framework (tf vs torch) Utilities
        • TensorFlow Utility Functions
        • PyTorch Utility Functions
        • Numpy Utility Functions
        • Deprecation Tools/Utils
  • Ray Workflows
    • Workflow Basics
    • Workflow Management
    • Virtual Actors
    • Workflow Metadata
    • Events
    • API Comparisons
    • Advanced Topics
    • Ray Workflows API
  • More Ray ML Libraries
    • Ray AI Runtime (alpha)
    • Distributed Scikit-learn / Joblib
    • Distributed LightGBM on Ray
    • Distributed multiprocessing.Pool
    • Ray Collective Communication Lib
    • Distributed PyTorch Lightning Training on Ray
    • Using Ray with Pytorch Lightning
    • Distributed XGBoost on Ray
    • XGBoost-Ray with Dask
    • XGBoost-Ray with Modin

Ray Core

  • Getting Started
  • Key Concepts
  • User Guides
    • Tasks
      • Specifying Required Resources
      • GPU Support
      • Nested Remote Functions
      • Fault Tolerance
      • Task Design Patterns
        • Pattern: Tree of tasks
        • Pattern: Map and reduce
        • Pattern: Using ray.wait to limit the number of in-flight tasks
        • Antipattern: Closure capture of large / unserializable object
        • Antipattern: Too fine-grained tasks
        • Antipattern: Unnecessary call of ray.get in a task
        • Antipattern: Calling ray.get in a loop
        • Antipattern: Processing results in submission order using ray.get
        • Antipattern: Fetching too many results at once with ray.get
        • Antipattern: Redefining task or actor in loop
        • Antipattern: Accessing Global Variable in Tasks/Actors
    • Actors
      • Named Actors
      • Terminating Actors
      • AsyncIO / Concurrency for Actors
      • Limiting Concurrency Per-Method with Concurrency Groups
      • Utility Classes
      • Fault Tolerance
      • Actor Design Patterns
        • Pattern: Tree of actors
        • Pattern: Multi-node synchronization using an Actor
        • Pattern: Concurrent operations with async actor
        • Pattern: Overlapping computation and communication
        • Pattern: Fault Tolerance with Actor Checkpointing
    • Objects
      • Serialization
      • Memory Management
      • Object Spilling
      • Fault Tolerance
    • Placement Groups
    • Environment Dependencies
    • More Topics
      • Tips for first-time users
      • Starting Ray
      • Debugging and Profiling
      • Using Namespaces
      • Cross-Language Programming
      • Working with Jupyter Notebooks & JupyterLab
      • Miscellaneous Topics
  • Examples
    • Asynchronous Advantage Actor Critic (A3C)
    • Fault-Tolerant Fairseq Training
    • Simple Parallel Model Selection
    • Batch L-BFGS
    • Parameter Server
    • Learning to Play Pong
  • Ray Core API

Ray Clusters

  • Ray Cluster Quick Start
  • Deployment Guide
    • Ray Cluster Overview
    • Cluster Deployment Guide
    • Ray Job Submission
    • Ray Client: Interactive Development
  • Launching Cloud Clusters
    • AWS Configurations
  • Ray with Cluster Managers
    • Deploying on Kubernetes
    • Deploying with KubeRay (experimental)
    • Deploying on YARN
    • Deploying on Slurm
    • Deploying on LSF
  • Ray Cluster API
    • Ray Cluster Config YAML and CLI
      • Cluster YAML Configuration Options
      • Cluster Launcher Commands
      • Autoscaler SDK
    • Ray Job Submission API

References

  • API References
    • Ray Datasets API
    • Ray Train API
    • Ray Tune API
      • Execution (tune.run, tune.Experiment)
      • Training (tune.Trainable, tune.report)
      • Search Space API
      • Search Algorithms (tune.suggest)
      • Trial Schedulers (tune.schedulers)
      • Stopping mechanisms (tune.stopper)
      • Analysis (tune.analysis)
      • Console Output (Reporters)
      • Loggers (tune.logger)
      • Environment variables
      • Scikit-Learn API (tune.sklearn)
      • External library integrations (tune.integration)
      • Tune Internals
      • Tune Client API
      • Tune CLI (Experimental)
    • Ray Serve API
    • Ray RLlib API
      • Environments
        • BaseEnv API
        • MultiAgentEnv API
        • VectorEnv API
        • ExternalEnv API
      • Trainer API
      • Policies
        • Base Policy class (ray.rllib.policy.policy.Policy)
        • TensorFlow-Specific Sub-Classes
        • Torch-Specific Policy: TorchPolicy
        • Building Custom Policy Classes
      • Model APIs
      • Evaluation and Environment Rollout
        • RolloutWorker
        • Sample Batches
        • WorkerSet
        • Environment Samplers
        • PolicyMap (ray.rllib.policy.policy_map.PolicyMap)
      • Offline RL
      • Distributed Execution API
      • RLlib Utilities
        • Exploration API
        • Schedules API
        • RLlib Annotations/Decorators
        • Deep Learning Framework (tf vs torch) Utilities
        • TensorFlow Utility Functions
        • PyTorch Utility Functions
        • Numpy Utility Functions
        • Deprecation Tools/Utils
    • Ray Workflows API
    • Ray Core API
    • Ray Cluster Config YAML and CLI
      • Cluster YAML Configuration Options
      • Cluster Launcher Commands
      • Autoscaler SDK
    • Ray Job Submission API

Developer Guides

  • Getting Involved / Contributing
    • Building Ray from Source
    • Contributing to the Ray Documentation
    • Testing Autoscaling Locally
    • Tips for testing Ray programs
  • Configuring Ray
  • Observability
    • Ray Dashboard
    • Ray Debugger
    • Logging
    • Exporting Metrics
    • Tracing
    • Debugging (internal)
    • Profiling (internal)
  • Architecture Whitepaper
Theme by the Executable Book Project

User Guides¶

If you’re new to Ray Datasets, we recommend starting with the Ray Datasets Quick Start. This user guide will help you navigate the Ray Datasets project and show you how achieve several tasks, for instance you will learn

  • how to load data and preprocess it for machine learning applications,

  • how to use Tensors with Ray Datasets,

  • how to run Dataset Pipelines in common scenarios,

  • and how to tune your Ray Datasets applications for performance.

  • Creating Datasets
  • Saving Datasets
  • Transforming Datasets
  • Exchanging Datasets
  • Pipelining Compute
    • Creating a DatasetPipeline
    • Per-Window Transformations
  • ML Preprocessing
    • Last-mile Preprocessing
  • Working with Tensors
    • Tables with tensor columns
    • Single-column tensor datasets
    • Reading existing serialized tensor columns
    • Working with tensor column datasets
    • Writing and reading tensor columns
    • End-to-end workflow with our Pandas extension type
    • Limitations
  • Advanced Pipeline Usage
    • Handling Epochs
    • Example: Pipelined Batch Inference
    • Enabling Pipelining
    • Tuning Parallelism
    • Per-Epoch Shuffle Pipeline
    • Pre-repeat vs post-repeat transforms
    • Splitting pipelines for distributed ingest
    • Distributed Ingest with Ray Train
    • Changing Pipeline Structure
  • Random Data Access (Experimental)
    • Architecture
    • Performance
    • Fault Tolerance
  • Using Custom Datasources
  • Performance Tips and Tuning
    • Debugging Statistics
    • Batching Transforms
    • Parquet Column Pruning
    • Tuning Read Parallelism
    • Tuning Max Block Size

previous

Key Concepts

next

Creating Datasets

By The Ray Team
© Copyright 2022, The Ray Team.