ray.data.datasource.BaseFileMetadataProvider.expand_paths#
- BaseFileMetadataProvider.expand_paths(paths: List[str], filesystem: pyarrow.fs.FileSystem | None, partitioning: Partitioning | None = None, ignore_missing_paths: bool = False) Iterator[Tuple[str, int]] [source]#
Expands all paths into concrete file paths by walking directories.
Also returns a sidecar of file sizes.
The input paths must be normalized for compatibility with the input filesystem prior to invocation.
- Parameters:
paths – A list of file and/or directory paths compatible with the given filesystem.
filesystem – The filesystem implementation that should be used for expanding all paths and reading their files.
ignore_missing_paths – If True, ignores any file paths in
paths
that are not found. Defaults to False.
- Returns:
An iterator of
(file_path, file_size)
pairs. None may be returned for the file size if it is either unknown or will be fetched later by_get_block_metadata()
, but the length of both lists must be equal.