ray.data.datasource.ParquetMetadataProvider#

class ray.data.datasource.ParquetMetadataProvider[source]#

Abstract callable that provides block metadata for Arrow Parquet file fragments.

All file fragments should belong to a single dataset block.

Supports optional pre-fetching of ordered metadata for all file fragments in a single batch to help optimize metadata resolution.

Current subclasses:

DefaultParquetMetadataProvider

DeveloperAPI: This API may change across minor Ray releases.

__init__()#

Methods

__init__()

prefetch_file_metadata(pieces, **ray_remote_args)

Pre-fetches file metadata for all Parquet file fragments in a single batch.