ray.data.datasource.PathPartitionFilter.__init__#

PathPartitionFilter.__init__(path_partition_parser: PathPartitionParser, filter_fn: Callable[[Dict[str, str]], bool])[source]#

Creates a new path-based partition filter based on a parser.

Parameters:
  • path_partition_parser – The path-based partition parser.

  • filter_fn – Callback used to filter partitions. Takes a dictionary mapping partition keys to values as input. Unpartitioned files are denoted with an empty input dictionary. Returns True to read a file for that partition or False to skip it. Partition keys and values are always strings read from the filesystem path. For example, this removes all unpartitioned files: lambda d: True if d else False This raises an assertion error for any unpartitioned file found: lambda d: assert d, "Expected all files to be partitioned!" And this only reads files from January, 2022 partitions: lambda d: d["month"] == "January" and d["year"] == "2022"