ray.data.from_huggingface#
- ray.data.from_huggingface(dataset: datasets.Dataset | datasets.IterableDataset) MaterializedDataset | Dataset [source]#
Create a
MaterializedDataset
from a Hugging Face Datasets Dataset or aDataset
from a Hugging Face Datasets IterableDataset. For anIterableDataset
, we use a streaming implementation to read data.Example
import ray import datasets hf_dataset = datasets.load_dataset("tweet_eval", "emotion") ray_ds = ray.data.from_huggingface(hf_dataset["train"]) print(ray_ds) hf_dataset_stream = datasets.load_dataset("tweet_eval", "emotion", streaming=True) ray_ds_stream = ray.data.from_huggingface(hf_dataset_stream["train"]) print(ray_ds_stream)
MaterializedDataset( num_blocks=..., num_rows=3257, schema={text: string, label: int64} ) Dataset( num_blocks=..., num_rows=3257, schema={text: string, label: int64} )
- Parameters:
dataset – A Hugging Face Datasets Dataset or Hugging Face Datasets IterableDataset. DatasetDict and IterableDatasetDict are not supported.
- Returns:
A
Dataset
holding rows from the Hugging Face Datasets Dataset.