ray.data.Dataset.serialize_lineage#

Dataset.serialize_lineage() bytes[source]#

Serialize this dataset’s lineage, not the actual data or the existing data futures, to bytes that can be stored and later deserialized, possibly on a different cluster.

Note that this will drop all computed data, and that everything is recomputed from scratch after deserialization.

Use Dataset.deserialize_lineage() to deserialize the serialized bytes returned from this method into a Dataset.

Note

Unioned and zipped datasets, produced by :py:meth`Dataset.union` and Dataset.zip(), are not lineage-serializable.

Examples

import ray

ds = ray.data.read_csv("s3://anonymous@ray-example-data/iris.csv")
serialized_ds = ds.serialize_lineage()
ds = ray.data.Dataset.deserialize_lineage(serialized_ds)
print(ds)
Dataset(
   num_blocks=...,
   num_rows=150,
   schema={
      sepal length (cm): double,
      sepal width (cm): double,
      petal length (cm): double,
      petal width (cm): double,
      target: int64
   }
)
Returns:

Serialized bytes containing the lineage of this dataset.

DeveloperAPI: This API may change across minor Ray releases.