- ray.data.from_spark(df: pyspark.sql.DataFrame, *, parallelism: int | None = None) MaterializedDataset #
df – A Spark DataFrame, which must be created by RayDP (Spark-on-Ray).
parallelism – The amount of parallelism to use for the dataset. If not provided, the parallelism is equal to the number of partitions of the original Spark DataFrame.
MaterializedDatasetholding rows read from the DataFrame.