ray.data.from_huggingface#

ray.data.from_huggingface(dataset: Union[datasets.Dataset, datasets.DatasetDict]) Union[ray.data.dataset.Dataset[ray.data._internal.arrow_block.ArrowRow], Dict[str, ray.data.dataset.Dataset[ray.data._internal.arrow_block.ArrowRow]]][source]#

Create a dataset from a Hugging Face Datasets Dataset.

This function is not parallelized, and is intended to be used with Hugging Face Datasets that are loaded into memory (as opposed to memory-mapped).

Parameters

dataset – A Hugging Face Dataset, or DatasetDict. IterableDataset is not supported.

Returns

Dataset holding Arrow records from the Hugging Face Dataset, or a dict of datasets in case dataset is a DatasetDict.

PublicAPI: This API is stable across Ray releases.