ray.data.block.BlockAccessor#

class ray.data.block.BlockAccessor(*args, **kwds)[source]#

Provides accessor methods for a specific block.

Ideally, we wouldn’t need a separate accessor classes for blocks. However, this is needed if we want to support storing pyarrow.Table directly as a top-level Ray object, without a wrapping class (issue #17186).

There are three types of block accessors: SimpleBlockAccessor, which operates over a plain Python list, ArrowBlockAccessor for pyarrow.Table type blocks, PandasBlockAccessor for pandas.DataFrame type blocks.

DeveloperAPI: This API may change across minor Ray releases.

__init__()#

Methods

__init__()

aggregate_combined_blocks(blocks, key, agg)

Aggregate partially combined and sorted blocks.

batch_to_block(batch)

Create a block from user-facing data formats.

builder()

Create a builder for this block type.

combine(key, agg)

Combine rows with the same key into an accumulator.

for_block(block)

Create a block accessor for the given block.

get_metadata(input_files, exec_stats)

Create a metadata object from this block.

iter_rows()

Iterate over the rows of this block.

merge_sorted_blocks(blocks, key, descending)

Return a sorted block by merging a list of sorted blocks.

num_rows()

Return the number of rows contained in this block.

random_shuffle(random_seed)

Randomly shuffle this block.

sample(n_samples, key)

Return a random sample of items from this block.

schema()

Return the Python type or pyarrow schema of this block.

select(columns)

Return a new block containing the provided columns.

size_bytes()

Return the approximate size in bytes of this block.

slice(start, end, copy)

Return a slice of this block.

sort_and_partition(boundaries, key, descending)

Return a list of sorted partitions of this block.

take(indices)

Return a new block containing the provided row indices.

to_arrow()

Convert this block into an Arrow table.

to_batch_format(batch_format)

Convert this block into the provided batch format.

to_block()

Return the base block that this accessor wraps.

to_default()

Return the default data format for this accessor.

to_numpy([columns])

Convert this block (or columns of block) into a NumPy ndarray.

to_pandas()

Convert this block into a Pandas dataframe.

zip(other)

Zip this block with another block of the same type and size.