ray.data.Dataset.filter#
- Dataset.filter(fn: Callable[[Dict[str, Any]], bool] | Callable[[Dict[str, Any]], Iterator[bool]] | _CallableClassProtocol | None = None, expr: str | None = None, *, compute: str | ComputeStrategy = None, concurrency: int | Tuple[int, int] | None = None, ray_remote_args_fn: Callable[[], Dict[str, Any]] | None = None, **ray_remote_args) Dataset [source]#
Filter out rows that don’t satisfy the given predicate.
You can use either a function or a callable class or an expression string to perform the transformation. For functions, Ray Data uses stateless Ray tasks. For classes, Ray Data uses stateful Ray actors. For more information, see Stateful Transforms.
Tip
If you use the
expr
parameter with a Python expression string, Ray Data optimizes your filter with native Arrow interfaces.Examples
>>> import ray >>> ds = ray.data.range(100) >>> ds.filter(expr="id <= 4").take_all() [{'id': 0}, {'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}]
Time complexity: O(dataset size / parallelism)
- Parameters:
fn – The predicate to apply to each row, or a class type that can be instantiated to create such a callable.
expr – An expression string needs to be a valid Python expression that will be converted to
pyarrow.dataset.Expression
type.compute – This argument is deprecated. Use
concurrency
argument.concurrency – The number of Ray workers to use concurrently. For a fixed-sized worker pool of size
n
, specifyconcurrency=n
. For an autoscaling worker pool fromm
ton
workers, specifyconcurrency=(m, n)
.ray_remote_args_fn – A function that returns a dictionary of remote args passed to each map worker. The purpose of this argument is to generate dynamic arguments for each actor/task, and will be called each time prior to initializing the worker. Args returned from this dict will always override the args in
ray_remote_args
. Note: this is an advanced, experimental feature.ray_remote_args – Additional resource requirements to request from Ray (e.g., num_gpus=1 to request GPUs for the map tasks). See
ray.remote()
for details.