Skip to content

[bug] to_arrow_batch_reader does not respect the given limit, returning more records than specified #1040

@kevinjqliu

Description

@kevinjqliu

Apache Iceberg version

main (development)

Please describe the bug 🐞

to_arrow_batch_reader bug

The bug is in project_batches, specifically with the way yield interacts with the two for-loops.

Here’s a Jupyter notebook reproducing the issue, see the last cell and the number of rows read by using to_arrow vs to_arrow_batch_reader.

This only occurs when there are more than one data files, which is not covered in tests.

For more details, see #1032 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions