[bug] `to_arrow_batch_reader` does not respect the given limit, returning more records than specified

### Apache Iceberg version

main (development)

### Please describe the bug 🐞

### `to_arrow_batch_reader` bug
The bug is in [project_batches](https://github.com/apache/iceberg-python/blob/d8b5c17cadbc99e53d08ade6109283ee73f0d83e/pyiceberg/io/pyarrow.py#L1457-L1475), specifically with the way yield interacts with the two for-loops. 

Here’s a [Jupyter notebook](https://gist.github.com/kevinjqliu/1b5bc445ed12ca78e564db139c1e5683) reproducing the issue, see the last cell and the number of rows read by using `to_arrow` vs `to_arrow_batch_reader`.

This only occurs when there are more than one data files, which is not covered in tests.

For more details, see https://github.com/apache/iceberg-python/issues/1032#issuecomment-2282819711

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] `to_arrow_batch_reader` does not respect the given limit, returning more records than specified #1040

Apache Iceberg version

Please describe the bug 🐞

`to_arrow_batch_reader` bug

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[bug] to_arrow_batch_reader does not respect the given limit, returning more records than specified #1040

Description

Apache Iceberg version

Please describe the bug 🐞

to_arrow_batch_reader bug

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[bug] `to_arrow_batch_reader` does not respect the given limit, returning more records than specified #1040

`to_arrow_batch_reader` bug