ray.data.from_huggingface
ray.data.from_huggingface#
- ray.data.from_huggingface(dataset: Union[datasets.Dataset, datasets.DatasetDict]) Union[ray.data.dataset.Dataset[ray.data._internal.arrow_block.ArrowRow], Dict[str, ray.data.dataset.Dataset[ray.data._internal.arrow_block.ArrowRow]]] [source]#
Create a dataset from a Hugging Face Datasets Dataset.
This function is not parallelized, and is intended to be used with Hugging Face Datasets that are loaded into memory (as opposed to memory-mapped).
- Parameters
dataset – A Hugging Face
Dataset
, orDatasetDict
.IterableDataset
is not supported.- Returns
Dataset holding Arrow records from the Hugging Face Dataset, or a dict of datasets in case
dataset
is aDatasetDict
.
PublicAPI: This API is stable across Ray releases.