Skip to content

Enhance ChunkIteratorV1 to make the last slice, if needed, smaller #586

@dmitriyrepin

Description

@dmitriyrepin

Currently, all slices are created of the same size, even if the last slice should have been smaller.
Here is a reproducer:

    chunks = (3, 4, 5)

    shape = (6, 12, 20)
    iter1 = ChunkIteratorV1(shape=shape, chunks=chunks)
    assert iter1.arr_shape == shape
    assert iter1.dims is None
    assert iter1.len_chunks == chunks
    assert iter1.dim_chunks == (2, 3, 4)
    assert iter1.num_chunks == 24

    shape = (5, 11, 19)
    iter2 = ChunkIteratorV1(shape=shape, chunks=chunks)
    assert iter2.dim_chunks == (2, 3, 4)
    assert iter2.num_chunks == 24

    # Its purpose is to confirm that all slices are created of the same size,
    # even if the last slice should have been smaller.
    for _ in range(13):  # element index 12
        region = iter1.__next__()
    assert region == (slice(3, 6, None), slice(0, 4, None), slice(0, 5, None))

    for _ in range(13):  # element index 12
        region = iter2.__next__()
    assert region == (slice(3, 6, None), slice(0, 4, None), slice(0, 5, None))

It might be more clean to reduce that size of the last slice to fit the data exactly (see iter2).
However, it is not strictly necessary:
Xarray, Zarr, and NumPy automatically truncates the slice to the valid bounds of the array
(see the test above, where the last chunk is always of the same size)
and does not raise an error. However, if one attempts to access an element at an index
that is out of bounds, you will get an IndexError

This needs to be handled as a separate task since the change to algorithm can affect multiple parts of the code and needs to be thoroughly tested.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions