ray.data.DatasetPipeline.map_batches#

DatasetPipeline.map_batches(fn: Union[Callable[[Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes, numpy.ndarray, Dict[str, numpy.ndarray]]], Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes, numpy.ndarray, Dict[str, numpy.ndarray]]], _CallableClassProtocol], *, batch_size: Optional[Union[int, typing_extensions.Literal[default]]] = 'default', compute: Optional[Union[str, ray.data._internal.compute.ComputeStrategy]] = None, batch_format: typing_extensions.Literal[default, pandas, pyarrow, numpy] = 'default', prefetch_batches: int = 0, fn_args: Optional[Iterable[Any]] = None, fn_kwargs: Optional[Dict[str, Any]] = None, fn_constructor_args: Optional[Iterable[Any]] = None, fn_constructor_kwargs: Optional[Dict[str, Any]] = None, **ray_remote_args) DatasetPipeline[U][source]#

Apply Dataset.map_batches to each dataset/window in this pipeline.