ray.data.datasource.Partitioning#

class ray.data.datasource.Partitioning(style: ray.data.datasource.partitioning.PartitionStyle, base_dir: Optional[str] = None, field_names: Optional[List[str]] = None, filesystem: Optional[pyarrow.fs.FileSystem] = None)[source]#

Partition scheme used to describe path-based partitions.

Path-based partition formats embed all partition keys and values directly in their dataset file paths.

style#

The partition style - may be either HIVE or DIRECTORY.

Type

ray.data.datasource.partitioning.PartitionStyle

base_dir#

“/”-delimited base directory that all partitioned paths should exist under (exclusive). File paths either outside of, or at the first level of, this directory will be considered unpartitioned. Specify None or an empty string to search for partitions in all file path directories.

Type

Optional[str]

field_names#

The partition key field names (i.e. column names for tabular datasets). When non-empty, the order and length of partition key field names must match the order and length of partition values. Required when parsing DIRECTORY partitioned paths or generating HIVE partitioned paths.

Type

Optional[List[str]]

filesystem#

Filesystem that will be used for partition path file I/O.

Type

Optional[pyarrow.fs.FileSystem]

DeveloperAPI: This API may change across minor Ray releases.

__init__(style: ray.data.datasource.partitioning.PartitionStyle, base_dir: Optional[str] = None, field_names: Optional[List[str]] = None, filesystem: Optional[pyarrow.fs.FileSystem] = None) None#

Methods

__init__(style[, base_dir, field_names, ...])

Attributes

base_dir

field_names

filesystem

normalized_base_dir

Returns the base directory normalized for compatibility with a filesystem.

resolved_filesystem

Returns the filesystem resolved for compatibility with a base directory.

style