ray.data.datasource.BaseFileMetadataProvider
ray.data.datasource.BaseFileMetadataProvider#
- class ray.data.datasource.BaseFileMetadataProvider[source]#
Bases:
ray.data.datasource.file_meta_provider.FileMetadataProvider
- Abstract callable that provides metadata for FileBasedDatasource
implementations that reuse the base
prepare_read
method.
Also supports file and file size discovery in input directory paths.
- Current subclasses:
DefaultFileMetadataProvider
DeveloperAPI: This API may change across minor Ray releases.
- expand_paths(paths: List[str], filesystem: Optional[pyarrow.fs.FileSystem]) Tuple[List[str], List[Optional[int]]] [source]#
Expands all paths into concrete file paths by walking directories.
Also returns a sidecar of file sizes.
The input paths must be normalized for compatibility with the input filesystem prior to invocation.
- Args:
- paths: A list of file and/or directory paths compatible with the
given filesystem.
- filesystem: The filesystem implementation that should be used for
expanding all paths and reading their files.
- Returns:
A tuple whose first item contains the list of file paths discovered, and whose second item contains the size of each file.
None
may be returned if a file size is either unknown or will be fetched later by_get_block_metadata()
, but the length of both lists must be equal.