ray.data.datasource.Partitioning
ray.data.datasource.Partitioning#
- class ray.data.datasource.Partitioning(style: ray.data.datasource.partitioning.PartitionStyle, base_dir: Optional[str] = None, field_names: Optional[List[str]] = None, filesystem: Optional[pyarrow.fs.FileSystem] = None)[source]#
Bases:
object
Partition scheme used to describe path-based partitions.
Path-based partition formats embed all partition keys and values directly in their dataset file paths.
DeveloperAPI: This API may change across minor Ray releases.
- style: ray.data.datasource.partitioning.PartitionStyle#
The partition style - may be either HIVE or DIRECTORY.
- base_dir: Optional[str] = None#
“/”-delimited base directory that all partitioned paths should exist under (exclusive). File paths either outside of, or at the first level of, this directory will be considered unpartitioned. Specify
None
or an empty string to search for partitions in all file path directories.
- field_names: Optional[List[str]] = None#
The partition key field names (i.e. column names for tabular datasets). When non-empty, the order and length of partition key field names must match the order and length of partition values. Required when parsing DIRECTORY partitioned paths or generating HIVE partitioned paths.
- filesystem: Optional[pyarrow.fs.FileSystem] = None#
Filesystem that will be used for partition path file I/O.
- property normalized_base_dir: str#
Returns the base directory normalized for compatibility with a filesystem.
- property resolved_filesystem: pyarrow.fs.FileSystem#
Returns the filesystem resolved for compatibility with a base directory.