ray.data.datasource.PathPartitionFilter#

class ray.data.datasource.PathPartitionFilter(path_partition_parser: ray.data.datasource.partitioning.PathPartitionParser, filter_fn: Callable[[Dict[str, str]], bool])[source]#

Partition filter for path-based partition formats.

Used to explicitly keep or reject files based on a custom filter function that takes partition keys and values parsed from the file’s path as input.

PublicAPI (beta): This API is in beta and may change before becoming stable.

__init__(path_partition_parser: ray.data.datasource.partitioning.PathPartitionParser, filter_fn: Callable[[Dict[str, str]], bool])[source]#

Creates a new path-based partition filter based on a parser.

Parameters
  • path_partition_parser – The path-based partition parser.

  • filter_fn – Callback used to filter partitions. Takes a dictionary mapping partition keys to values as input. Unpartitioned files are denoted with an empty input dictionary. Returns True to read a file for that partition or False to skip it. Partition keys and values are always strings read from the filesystem path. For example, this removes all unpartitioned files: lambda d: True if d else False This raises an assertion error for any unpartitioned file found: lambda d: assert d, "Expected all files to be partitioned!" And this only reads files from January, 2022 partitions: lambda d: d["month"] == "January" and d["year"] == "2022"

Methods

__init__(path_partition_parser, filter_fn)

Creates a new path-based partition filter based on a parser.

of(filter_fn[, style, base_dir, ...])

Creates a path-based partition filter using a flattened argument list.

Attributes

parser

Returns the path partition parser for this filter.