ray.data.Dataset.write_datasink#
- Dataset.write_datasink(datasink: Datasink, *, ray_remote_args: Dict[str, Any] = None, concurrency: int | None = None) None [source]#
Writes the dataset to a custom
Datasink
.Note
This operation will trigger execution of the lazy transformations performed on this dataset.
Time complexity: O(dataset size / parallelism)
- Parameters:
datasink – The
Datasink
to write to.ray_remote_args – Kwargs passed to
ray.remote
in the write tasks.concurrency – The maximum number of Ray tasks to run concurrently. Set this to control number of tasks to run concurrently. This doesn’t change the total number of tasks run. By default, concurrency is dynamically decided based on the available resources.