ray.data.block.BlockAccessor
ray.data.block.BlockAccessor#
- class ray.data.block.BlockAccessor(*args, **kwds)[source]#
Bases:
Generic
[ray.data.block.T
]Provides accessor methods for a specific block.
Ideally, we wouldn’t need a separate accessor classes for blocks. However, this is needed if we want to support storing
pyarrow.Table
directly as a top-level Ray object, without a wrapping class (issue #17186).There are three types of block accessors:
SimpleBlockAccessor
, which operates over a plain Python list,ArrowBlockAccessor
forpyarrow.Table
type blocks,PandasBlockAccessor
forpandas.DataFrame
type blocks.DeveloperAPI: This API may change across minor Ray releases.
- slice(start: int, end: int, copy: bool) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Return a slice of this block.
- Parameters
start – The starting index of the slice.
end – The ending index of the slice.
copy – Whether to perform a data copy for the slice.
- Returns
The sliced block result.
- take(indices: List[int]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Return a new block containing the provided row indices.
- Parameters
indices – The row indices to return.
- Returns
A new block containing the provided row indices.
- select(columns: List[Union[None, str, Callable[[ray.data.block.T], Any]]]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Return a new block containing the provided columns.
- random_shuffle(random_seed: Optional[int]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Randomly shuffle this block.
- to_numpy(columns: Optional[Union[str, List[str]]] = None) Union[numpy.ndarray, Dict[str, numpy.ndarray]] [source]#
Convert this block (or columns of block) into a NumPy ndarray.
- Parameters
columns – Name of columns to convert, or None if converting all columns.
- to_block() Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Return the base block that this accessor wraps.
- to_default() Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Return the default data format for this accessor.
- to_batch_format(batch_format: str) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes, numpy.ndarray, Dict[str, numpy.ndarray]] [source]#
Convert this block into the provided batch format.
- Parameters
batch_format – The batch format to convert this block to.
- Returns
This block formatted as the provided batch format.
- schema() Union[type, pyarrow.lib.Schema] [source]#
Return the Python type or pyarrow schema of this block.
- get_metadata(input_files: List[str], exec_stats: Optional[ray.data.block.BlockExecStats]) ray.data.block.BlockMetadata [source]#
Create a metadata object from this block.
- zip(other: Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Zip this block with another block of the same type and size.
- static batch_to_block(batch: Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes, numpy.ndarray, Dict[str, numpy.ndarray]]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Create a block from user-facing data formats.
- static for_block(block: Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]) BlockAccessor[T] [source]#
Create a block accessor for the given block.
- sample(n_samples: int, key: Any) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Return a random sample of items from this block.
- sort_and_partition(boundaries: List[ray.data.block.T], key: Any, descending: bool) List[Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]] [source]#
Return a list of sorted partitions of this block.
- combine(key: Union[None, str, Callable[[ray.data.block.T], Any]], agg: AggregateFn) Union[List[ray.data.block.U], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Combine rows with the same key into an accumulator.
- static merge_sorted_blocks(blocks: List[Block[T]], key: Any, descending: bool) Tuple[Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes], ray.data.block.BlockMetadata] [source]#
Return a sorted block by merging a list of sorted blocks.
- static aggregate_combined_blocks(blocks: List[Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]], key: Union[None, str, Callable[[ray.data.block.T], Any]], agg: AggregateFn) Tuple[Union[List[ray.data.block.U], pyarrow.Table, pandas.DataFrame, bytes], ray.data.block.BlockMetadata] [source]#
Aggregate partially combined and sorted blocks.