ray.data.datasource.PathPartitionFilter.__init__#
- PathPartitionFilter.__init__(path_partition_parser: PathPartitionParser, filter_fn: Callable[[Dict[str, str]], bool])[source]#
Creates a new path-based partition filter based on a parser.
- Parameters:
path_partition_parser – The path-based partition parser.
filter_fn – Callback used to filter partitions. Takes a dictionary mapping partition keys to values as input. Unpartitioned files are denoted with an empty input dictionary. Returns
True
to read a file for that partition orFalse
to skip it. Partition keys and values are always strings read from the filesystem path. For example, this removes all unpartitioned files:lambda d: True if d else False
This raises an assertion error for any unpartitioned file found:lambda d: assert d, "Expected all files to be partitioned!"
And this only reads files from January, 2022 partitions:lambda d: d["month"] == "January" and d["year"] == "2022"