ray.data.from_pandas#

ray.data.from_pandas(dfs: Union[pandas.DataFrame, List[pandas.DataFrame]]) ray.data.dataset.MaterializedDataset[source]#

Create a Dataset from a list of pandas dataframes.

Examples

>>> import pandas as pd
>>> import ray
>>> df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> ray.data.from_pandas(df)
MaterializedDataset(num_blocks=1, num_rows=3, schema={a: int64, b: int64})

Create a Ray Dataset from a list of Pandas DataFrames.

>>> ray.data.from_pandas([df, df])
MaterializedDataset(num_blocks=2, num_rows=6, schema={a: int64, b: int64})
Parameters

dfs – A pandas dataframe or a list of pandas dataframes.

Returns

Dataset holding data read from the dataframes.