ray.data.Datasink.on_write_start#

Datasink.on_write_start(schema: pa.Schema | None = None) → None[source]#

Callback for when a write job starts.

Use this method to perform setup for write tasks. For example, creating a staging bucket in S3.

This is called on the driver when the first input bundle is ready, just before write tasks are submitted. The schema is extracted from the first input bundle, enabling schema-dependent initialization.

Parameters:: schema – The PyArrow schema of the data being written. This is automatically extracted from the first input bundle. May be None if the input data has no schema.