ray.data.Dataset.flat_map#

Dataset.flat_map(fn: Union[Callable[[ray.data.block.T], ray.data.block.U], _CallableClassProtocol[T, U]], *, compute: Union[str, ray.data._internal.compute.ComputeStrategy] = None, **ray_remote_args) Dataset[U][source]#

Apply the given function to each record and then flatten results.

This is a blocking operation. Consider using .map_batches() for better performance (the batch size can be altered in map_batches).

Examples

>>> import ray
>>> ds = ray.data.range(1000)
>>> ds.flat_map(lambda x: [x, x ** 2, x ** 3])
FlatMap
+- Dataset(num_blocks=..., num_rows=1000, schema=<class 'int'>)

Time complexity: O(dataset size / parallelism)

Parameters
  • fn – The function to apply to each record, or a class type that can be instantiated to create such a callable. Callable classes are only supported for the actor compute strategy.

  • compute – The compute strategy, either “tasks” (default) to use Ray tasks, or “actors” to use an autoscaling actor pool. If wanting to configure the min or max size of the autoscaling actor pool, you can provide an ActorPoolStrategy(min, max) instance. If using callable classes for fn, the actor compute strategy must be used.

  • ray_remote_args – Additional resource requirements to request from ray (e.g., num_gpus=1 to request GPUs for the map tasks).

See also

map_batches()

Call this method to transform batches of data. It’s faster and more flexible than map() and flat_map().

map()

Call this method to transform one record at time.

This method isn’t recommended because it’s slow; call map_batches() instead.