ray.data.datasource.PathPartitionEncoder#

class ray.data.datasource.PathPartitionEncoder(partitioning: ray.data.datasource.partitioning.Partitioning)[source]#

Bases: object

Callable that generates directory path strings for path-based partition formats.

Path-based partition formats embed all partition keys and values directly in their dataset file paths.

Two path partition formats are currently supported - HIVE and DIRECTORY.

For HIVE Partitioning, all partition directories will be generated using a “{key1}={value1}/{key2}={value2}” naming convention under the base directory. An accompanying ordered list of partition key field names must also be provided, where the order and length of all partition values must match the order and length of field names

For DIRECTORY Partitioning, all directories will be generated from partition values using a “{value1}/{value2}” naming convention under the base directory.

DeveloperAPI: This API may change across minor Ray releases.

static of(style: ray.data.datasource.partitioning.PartitionStyle = PartitionStyle.HIVE, base_dir: Optional[str] = None, field_names: Optional[List[str]] = None, filesystem: Optional[pyarrow.fs.FileSystem] = None) PathPartitionEncoder[source]#

Creates a new partition path encoder.

Parameters
  • style – The partition style - may be either HIVE or DIRECTORY.

  • base_dir – “/”-delimited base directory that all partition paths will be generated under (exclusive).

  • field_names – The partition key field names (i.e. column names for tabular datasets). Required for HIVE partition paths, optional for DIRECTORY partition paths. When non-empty, the order and length of partition key field names must match the order and length of partition values.

  • filesystem – Filesystem that will be used for partition path file I/O.

Returns

The new partition path encoder.

property scheme: ray.data.datasource.partitioning.Partitioning#

Returns the partitioning for this encoder.