ray.data.block.BlockAccessor#

class ray.data.block.BlockAccessor(*args, **kwds)[source]#

Bases: Generic[ray.data.block.T]

Provides accessor methods for a specific block.

Ideally, we wouldn’t need a separate accessor classes for blocks. However, this is needed if we want to support storing pyarrow.Table directly as a top-level Ray object, without a wrapping class (issue #17186).

There are three types of block accessors: SimpleBlockAccessor, which operates over a plain Python list, ArrowBlockAccessor for pyarrow.Table type blocks, PandasBlockAccessor for pandas.DataFrame type blocks.

DeveloperAPI: This API may change across minor Ray releases.

num_rows() int[source]#

Return the number of rows contained in this block.

iter_rows() Iterator[ray.data.block.T][source]#

Iterate over the rows of this block.

slice(start: int, end: int, copy: bool) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]#

Return a slice of this block.

Parameters
  • start – The starting index of the slice.

  • end – The ending index of the slice.

  • copy – Whether to perform a data copy for the slice.

Returns

The sliced block result.

take(indices: List[int]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]#

Return a new block containing the provided row indices.

Parameters

indices – The row indices to return.

Returns

A new block containing the provided row indices.

select(columns: List[Union[None, str, Callable[[ray.data.block.T], Any]]]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]#

Return a new block containing the provided columns.

random_shuffle(random_seed: Optional[int]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]#

Randomly shuffle this block.

to_pandas() pandas.DataFrame[source]#

Convert this block into a Pandas dataframe.

to_numpy(columns: Optional[Union[str, List[str]]] = None) Union[numpy.ndarray, Dict[str, numpy.ndarray]][source]#

Convert this block (or columns of block) into a NumPy ndarray.

Parameters

columns – Name of columns to convert, or None if converting all columns.

to_arrow() pyarrow.Table[source]#

Convert this block into an Arrow table.

to_block() Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]#

Return the base block that this accessor wraps.

to_default() Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]#

Return the default data format for this accessor.

to_batch_format(batch_format: str) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes, numpy.ndarray, Dict[str, numpy.ndarray]][source]#

Convert this block into the provided batch format.

Parameters

batch_format – The batch format to convert this block to.

Returns

This block formatted as the provided batch format.

size_bytes() int[source]#

Return the approximate size in bytes of this block.

schema() Union[type, pyarrow.lib.Schema][source]#

Return the Python type or pyarrow schema of this block.

get_metadata(input_files: List[str], exec_stats: Optional[ray.data.block.BlockExecStats]) ray.data.block.BlockMetadata[source]#

Create a metadata object from this block.

zip(other: Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]#

Zip this block with another block of the same type and size.

static builder() BlockBuilder[T][source]#

Create a builder for this block type.

static batch_to_block(batch: Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes, numpy.ndarray, Dict[str, numpy.ndarray]]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]#

Create a block from user-facing data formats.

static for_block(block: Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]) BlockAccessor[T][source]#

Create a block accessor for the given block.

sample(n_samples: int, key: Any) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]#

Return a random sample of items from this block.

sort_and_partition(boundaries: List[ray.data.block.T], key: Any, descending: bool) List[Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]][source]#

Return a list of sorted partitions of this block.

combine(key: Union[None, str, Callable[[ray.data.block.T], Any]], agg: AggregateFn) Union[List[ray.data.block.U], pyarrow.Table, pandas.DataFrame, bytes][source]#

Combine rows with the same key into an accumulator.

static merge_sorted_blocks(blocks: List[Block[T]], key: Any, descending: bool) Tuple[Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes], ray.data.block.BlockMetadata][source]#

Return a sorted block by merging a list of sorted blocks.

static aggregate_combined_blocks(blocks: List[Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]], key: Union[None, str, Callable[[ray.data.block.T], Any]], agg: AggregateFn) Tuple[Union[List[ray.data.block.U], pyarrow.Table, pandas.DataFrame, bytes], ray.data.block.BlockMetadata][source]#

Aggregate partially combined and sorted blocks.