ray.data.Dataset.write_lance#
- Dataset.write_lance(path: str, *, schema: pyarrow.Schema | None = None, mode: Literal['create', 'append', 'overwrite'] = 'create', max_rows_per_file: int = 1048576, data_storage_version: str | None = None, storage_options: Dict[str, Any] | None = None, ray_remote_args: Dict[str, Any] = None, concurrency: int | None = None) None [source]#
Write the dataset to a Lance dataset.
Note
This operation will trigger execution of the lazy transformations performed on this dataset.
Examples
docs = [{"title": "Lance data sink test"} for key in range(4)] ds = ray.data.from_pandas(pd.DataFrame(docs)) ds.write_lance("/tmp/data/")
- Parameters:
path – The path to the destination Lance dataset.
schema – The schema of the dataset. If not provided, it is inferred from the data.
mode – The write mode. Can be “create”, “append”, or “overwrite”.
max_rows_per_file – The maximum number of rows per file.
data_storage_version – The version of the data storage format to use. Newer versions are more efficient but require newer versions of lance to read. The default is “legacy” which will use the legacy v1 version. See the user guide for more details.
storage_options – The storage options for the writer. Default is None.