ray.data.datasource.ParquetMetadataProvider
ray.data.datasource.ParquetMetadataProvider#
- class ray.data.datasource.ParquetMetadataProvider[source]#
Bases:
ray.data.datasource.file_meta_provider.FileMetadataProvider
Abstract callable that provides block metadata for Arrow Parquet file fragments.
All file fragments should belong to a single dataset block.
Supports optional pre-fetching of ordered metadata for all file fragments in a single batch to help optimize metadata resolution.
- Current subclasses:
DefaultParquetMetadataProvider
DeveloperAPI: This API may change across minor Ray releases.
Methods
__init__
()prefetch_file_metadata
(pieces, **ray_remote_args)Pre-fetches file metadata for all Parquet file fragments in a single batch.