ray.data.Dataset.write_lance#

Dataset.write_lance(path: str, *, schema: pyarrow.Schema | None = None, mode: Literal['create', 'append', 'overwrite'] = 'create', min_rows_per_file: int = 1048576, max_rows_per_file: int = 67108864, data_storage_version: str | None = None, storage_options: Dict[str, Any] | None = None, ray_remote_args: Dict[str, Any] = None, concurrency: int | None = None) → None[source]#

Write the dataset to a Lance dataset.

Note

This operation will trigger execution of the lazy transformations performed on this dataset.

Examples

docs = [{"title": "Lance data sink test"} for key in range(4)]
ds = ray.data.from_pandas(pd.DataFrame(docs))
ds.write_lance("/tmp/data/")

Parameters:

path – The path to the destination Lance dataset.
schema – The schema of the dataset. If not provided, it is inferred from the data.
mode – The write mode. Can be “create”, “append”, or “overwrite”.
min_rows_per_file – The minimum number of rows per file.
max_rows_per_file – The maximum number of rows per file.
data_storage_version – The version of the data storage format to use. Newer versions are more efficient but require newer versions of lance to read. The default is “legacy” which will use the legacy v1 version. See the user guide for more details.
storage_options – The storage options for the writer. Default is None.