ray.data.Dataset.flat_map
ray.data.Dataset.flat_map#
- Dataset.flat_map(fn: Union[Callable[[ray.data.block.T], ray.data.block.U], _CallableClassProtocol[T, U]], *, compute: Union[str, ray.data._internal.compute.ComputeStrategy] = None, **ray_remote_args) Dataset[U] [source]#
Apply the given function to each record and then flatten results.
This is a blocking operation. Consider using
.map_batches()
for better performance (the batch size can be altered in map_batches).Examples
>>> import ray >>> ds = ray.data.range(1000) >>> ds.flat_map(lambda x: [x, x ** 2, x ** 3]) FlatMap +- Dataset(num_blocks=..., num_rows=1000, schema=<class 'int'>)
Time complexity: O(dataset size / parallelism)
- Parameters
fn – The function to apply to each record, or a class type that can be instantiated to create such a callable. Callable classes are only supported for the actor compute strategy.
compute – The compute strategy, either “tasks” (default) to use Ray tasks, or “actors” to use an autoscaling actor pool. If wanting to configure the min or max size of the autoscaling actor pool, you can provide an
ActorPoolStrategy(min, max)
instance. If using callable classes for fn, the actor compute strategy must be used.ray_remote_args – Additional resource requirements to request from ray (e.g., num_gpus=1 to request GPUs for the map tasks).
See also
map_batches()
Call this method to transform batches of data. It’s faster and more flexible than
map()
andflat_map()
.map()
Call this method to transform one record at time.
This method isn’t recommended because it’s slow; call
map_batches()
instead.