ray.data.datasource.PathPartitionParser#

class ray.data.datasource.PathPartitionParser(partitioning: Partitioning)[source]#

Partition parser for path-based partition formats.

Path-based partition formats embed all partition keys and values directly in their dataset file paths.

Two path partition formats are currently supported - HIVE and DIRECTORY.

For HIVE Partitioning, all partition directories under the base directory will be discovered based on {key1}={value1}/{key2}={value2} naming conventions. Key/value pairs do not need to be presented in the same order across all paths. Directory names nested under the base directory that don’t follow this naming condition will be considered unpartitioned. If a partition filter is defined, then it will be called with an empty input dictionary for each unpartitioned file.

For DIRECTORY Partitioning, all directories under the base directory will be interpreted as partition values of the form {value1}/{value2}. An accompanying ordered list of partition field names must also be provided, where the order and length of all partition values must match the order and length of field names. Files stored directly in the base directory will be considered unpartitioned. If a partition filter is defined, then it will be called with an empty input dictionary for each unpartitioned file. For example, if the base directory is "foo", then "foo.csv" and "foo/bar.csv" would be considered unpartitioned files but "foo/bar/baz.csv" would be associated with partition "bar". If the base directory is undefined, then "foo.csv" would be unpartitioned, "foo/bar.csv" would be associated with partition "foo", and “foo/bar/baz.csv” would be associated with partition ("foo", "bar").

DeveloperAPI: This API may change across minor Ray releases.

Methods

__init__

Creates a path-based partition parser.

of

Creates a path-based partition parser using a flattened argument list.

Attributes

scheme

Returns the partitioning for this parser.