ray.data.datasource.DefaultParquetMetadataProvider#

class ray.data.datasource.DefaultParquetMetadataProvider[source]#

The default file metadata provider for ParquetDatasource.

Aggregates total block bytes and number of rows using the Parquet file metadata associated with a list of Arrow Parquet dataset file fragments.

DeveloperAPI: This API may change across minor Ray releases.

__init__()#

Methods

__init__()

prefetch_file_metadata(pieces, **ray_remote_args)

Pre-fetches file metadata for all Parquet file fragments in a single batch.