ray.data.Dataset.to_spark#

Dataset.to_spark(spark: pyspark.sql.SparkSession) pyspark.sql.DataFrame[source]#

Convert this Dataset into a Spark DataFrame.

Note

This operation will trigger execution of the lazy transformations performed on this dataset.

Time complexity: O(dataset size / parallelism)

Parameters

spark – A SparkSession, which must be created by RayDP (Spark-on-Ray).

Returns

A Spark DataFrame created from this dataset.