ray.data.from_pandas#

ray.data.from_pandas(dfs: pandas.DataFrame | List[pandas.DataFrame]) MaterializedDataset[source]#

Create a Dataset from a list of pandas dataframes.

Examples

>>> import pandas as pd
>>> import ray
>>> df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> ray.data.from_pandas(df)
MaterializedDataset(num_blocks=1, num_rows=3, schema={a: int64, b: int64})

Create a Ray Dataset from a list of Pandas DataFrames.

>>> ray.data.from_pandas([df, df])
MaterializedDataset(num_blocks=2, num_rows=6, schema={a: int64, b: int64})
Parameters:

dfs – A pandas dataframe or a list of pandas dataframes.

Returns:

Dataset holding data read from the dataframes.