GPT-J-6B Batch Prediction with Ray Data#

This example showcases how to use the Ray Data for GPT-J batch inference. GPT-J is a GPT-2-like causal language model trained on the Pile dataset. This model has 6 billion parameters. For more information on GPT-J, click here.

We use Ray Data and a pretrained model from Hugging Face hub. Note that you can easily adapt this example to use other similar models.

It is highly recommended to read Ray Train Key Concepts and Ray Data Key Concepts before starting this example.

If you are interested in serving (online inference), see GPT-J-6B Serving with Ray Serve.

Note

In order to run this example, make sure your Ray cluster has access to at least one GPU with 16 or more GBs of memory. The amount of memory needed will depend on the model.

model_id = "EleutherAI/gpt-j-6B"
revision = "float16"  # use float16 weights to fit in 16GB GPUs
prompt = (
    "In a shocking finding, scientists discovered a herd of unicorns living in a remote, "
    "previously unexplored valley, in the Andes Mountains. Even more surprising to the "
    "researchers was the fact that the unicorns spoke perfect English."
)
import ray

We define a runtime environment to ensure that the Ray workers have access to all the necessary packages. You can omit the runtime_env argument if you have all of the packages already installed on each node in your cluster.

ray.init(
    runtime_env={
        "pip": [
            "accelerate>=0.16.0",
            "transformers>=4.26.0",
            "numpy<1.24",  # remove when mlflow updates beyond 2.2
            "torch",
        ]
    }
)

For the purposes of this example, we will use a very small toy dataset composed of multiple copies of our prompt. Ray Data can handle much bigger datasets with ease.

import ray.data
import pandas as pd

ds = ray.data.from_pandas(pd.DataFrame([prompt] * 10, columns=["prompt"]))

Since we will be using a pretrained model from Hugging Face hub, the simplest way is to use map_batches with a callable class UDF. This will allow us to save time by initializing a model just once and then feed it multiple batches of data.

class PredictCallable:
    def __init__(self, model_id: str, revision: str = None):
        from transformers import AutoModelForCausalLM, AutoTokenizer
        import torch

        self.model = AutoModelForCausalLM.from_pretrained(
            model_id,
            revision=revision,
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True,
            device_map="auto",  # automatically makes use of all GPUs available to the Actor
        )
        self.tokenizer = AutoTokenizer.from_pretrained(model_id)

    def __call__(self, batch: pd.DataFrame) -> pd.DataFrame:
        tokenized = self.tokenizer(
            list(batch["prompt"]), return_tensors="pt"
        )
        input_ids = tokenized.input_ids.to(self.model.device)
        attention_mask = tokenized.attention_mask.to(self.model.device)

        gen_tokens = self.model.generate(
            input_ids=input_ids,
            attention_mask=attention_mask,
            do_sample=True,
            temperature=0.9,
            max_length=100,
            pad_token_id=self.tokenizer.eos_token_id,
        )
        return pd.DataFrame(
            self.tokenizer.batch_decode(gen_tokens), columns=["responses"]
        )

All that is left is to run the map_batches method on the dataset. We specify that we want to use one GPU for each Ray Actor that will be running our callable class.

Also notice that we repartition the dataset into 100 partitions before mapping batches. This is to make sure there will be enough parallel tasks to take advantage of all the GPUs. 100 is an arbitrary number. You can pick any other numbers as long as it is more than the number of available GPUs in the cluster.

Tip

If you have access to large GPUs, you may want to increase the batch size to better saturate them.

If you want to use inter-node model parallelism, you can also increase num_gpus. As we have created the model with device_map="auto", it will be automatically placed on correct devices. Note that this requires nodes with multiple GPUs.

preds = (
    ds
    .repartition(100)
    .map_batches(
        PredictCallable,
        batch_size=4,
        fn_constructor_kwargs=dict(model_id=model_id, revision=revision),
        batch_format="pandas",
        compute=ray.data.ActorPoolStrategy(),
        num_gpus=1,
    )
)

After map_batches is done, we can view our generated text.

preds.take_all()
2023-02-28 10:40:50,530	INFO bulk_executor.py:41 -- Executing DAG InputDataBuffer[Input] -> ActorPoolMapOperator[MapBatches(PredictCallable)]
MapBatches(PredictCallable), 0 actors [0 locality hits, 1 misses]: 100%|██████████| 1/1 [12:10<00:00, 730.80s/it]
[{'responses': 'In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.\n\nThe finding comes from the team of researchers, which includes Dr. Michael Goldberg, a professor and chair of the Zoology Department at the University of Maryland. Dr. Goldberg spent a year collecting and conducting research in the Ecuadorian Andes, including the Pinchahu'},
 {'responses': 'In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.\n\nThe team of British, Argentine and Chilean scientists found that the elusive unicorns had been living in the valley for at least 50 years, and had even interacted with humans.\n\nThe team’s findings published in the journal Scientific Reports has been hailed as a'},
 {'responses': 'In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.\n\nAs far as the rest of human kind knew, unicorns had never existed on Earth, but the presence of this herd has left some very confused. Are the scientists simply overreacting? Or has the valley become the new Unicorn Valley?\n\nThere are only'},
 {'responses': 'In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. The discovery was announced by Oxford University and was published in the journal Science. According to the researchers, this is proof of an alien life. This time around the aliens are definitely not from outer space – they are quite cozy.\n\n“I saw the herd for the'},
 {'responses': 'In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.\n\nIn the article, The Daily Beast and NewScientist report on these "extraordinary find[s], reported this week to the Royal Society." According to the article:\n\nThe authors, who were part of a team from the University of Lincoln’s'},
 {'responses': 'In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. This was no ordinary herd of animals.\n\nThe discovery was made by the team while they were riding horses in the wilds of the Peruvian Andes. As they rode through the area, they came upon a herd of white alpacas, which were quite exotic'},
 {'responses': 'In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.\n\nThe mountain valley that the unicorns lived in sat under the shadow of an active volcano emitting smoke as big as Mount St. Helens. The scientists named the newly discovered unicorn herd the Andes Biodiversity Center—or ABC for short.\n\nThe discovery'},
 {'responses': 'In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.\n\nUnicorns have been depicted in the fairy tales and legends of many cultures throughout history, but scientists have been unable to explain the species.\n\nIn a paper published in the journal Biology Letters, the researchers studied five male and five female unicorns and their offspring'},
 {'responses': 'In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. How did this amazing discovery occur?\n\nBefore I tell you exactly how unicorns managed to come into existence, allow me to explain how I think unicorns occur. I think they exist in the same way some people believe Jesus rose from the dead.\n\nI think'},
 {'responses': 'In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.\n\nIt is well known that horses, zebras, and other hoofed beasts have long since left their ancestral lands in the grasslands of South America, and are now found throughout Eurasia and North Africa. However, there are also a number of other,'}]

You may notice that we are not using a Predictor here. This is because Predictors are mainly intended to be used with Train Checkpoints, which we don’t for this example. See ray.train.predictor.Predictor for more information and usage examples.