ray.data.Dataset.map#
- Dataset.map(fn: Callable[[Dict[str, Any]], Dict[str, Any]] | Callable[[Dict[str, Any]], Iterator[Dict[str, Any]]] | _CallableClassProtocol, *, compute: ComputeStrategy | None = None, fn_args: Iterable[Any] | None = None, fn_kwargs: Dict[str, Any] | None = None, fn_constructor_args: Iterable[Any] | None = None, fn_constructor_kwargs: Dict[str, Any] | None = None, num_cpus: float | None = None, num_gpus: float | None = None, **ray_remote_args) Dataset [source]#
Apply the given function to each row of this dataset.
Use this method to transform your data. To learn more, see Transforming rows.
Tip
If your transformation is vectorized like most NumPy or pandas operations,
map_batches()
might be faster.Examples
import os from typing import Any, Dict import ray def parse_filename(row: Dict[str, Any]) -> Dict[str, Any]: row["filename"] = os.path.basename(row["path"]) return row ds = ( ray.data.read_images("s3://anonymous@ray-example-data/image-datasets/simple", include_paths=True) .map(parse_filename) ) print(ds.schema())
Column Type ------ ---- image numpy.ndarray(shape=(32, 32, 3), dtype=uint8) path string filename string
Time complexity: O(dataset size / parallelism)
- Parameters:
fn – The function to apply to each row, or a class type that can be instantiated to create such a callable. Callable classes are only supported for the actor compute strategy.
compute – The compute strategy, either None (default) to use Ray tasks,
ray.data.ActorPoolStrategy(size=n)
to use a fixed-size actor pool, orray.data.ActorPoolStrategy(min_size=m, max_size=n)
for an autoscaling actor pool.fn_args – Positional arguments to pass to
fn
after the first argument. These arguments are top-level arguments to the underlying Ray task.fn_kwargs – Keyword arguments to pass to
fn
. These arguments are top-level arguments to the underlying Ray task.fn_constructor_args – Positional arguments to pass to
fn
’s constructor. You can only provide this iffn
is a callable class. These arguments are top-level arguments in the underlying Ray actor construction task.fn_constructor_kwargs – Keyword arguments to pass to
fn
’s constructor. This can only be provided iffn
is a callable class. These arguments are top-level arguments in the underlying Ray actor construction task.num_cpus – The number of CPUs to reserve for each parallel map worker.
num_gpus – The number of GPUs to reserve for each parallel map worker. For example, specify
num_gpus=1
to request 1 GPU for each parallel map worker.ray_remote_args – Additional resource requirements to request from Ray for each map worker.
See also
flat_map()
Call this method to create new rows from existing ones. Unlike
map()
, a function passed toflat_map()
can return multiple rows.map_batches()
Call this method to transform batches of data.