refactor!: do not default the struct array length to 0 in Struct::try_new#7247
Conversation
|
I think as this is a breaking change regardless, we should probably just add the length as an explicit argument and avoid potential errors/panics. I don't really feel strongly though, I personally think empty StructArrays are something the arrow spec probably shouldn't permit, but that ship has sailed... Perhaps @alamb has thoughts on the matter |
|
Ah, as you were typing that I saw the suggestions from #6732 which suggested adding |
I'm personally less a fan of this, as it isn't materially different from the current state of play - where there are multiple alternative methods. I guess I was thinking something along the lines of That being said, this would be potentially quite a disruptive change, and I am not sure how common empty StructArray actually are, I can't help feeling most of the time they'd arise from a deficiency in a projection system, rather than someone actually creating them... For an additional data point, RecordBatch::try_new is consistent with StructArray, and so not making this change might be more consistent. I don't really feel strongly on this, which I guess would be an argument towards not making a breaking change here... I'll let others weigh in. |
|
I would push for the breaking change honestly.. Here's another example: #7447 |
alamb
left a comment
There was a problem hiding this comment.
This PR makes sense to me and i think is consistent with RecordBatch -- thank you @westonpace and @tustvold and @gatesn
For an additional data point, RecordBatch::try_new is consistent with StructArray, and so not making this change might be more consistent. I don't really feel strongly on this, which I guess would be an argument towards not making a breaking change here... I'll let others weigh in.
I think RecordBatch::try_new errors if there are no columns:
arrow-rs/arrow-array/src/record_batch.rs
Lines 314 to 321 in 7f3907e
I would push for the breaking change honestly..
One thing we could do is deprecate StructArray::try_new which would give a bunch of warnings and encourage people to upgrade, but that would be inconsistent with RecordBatch::try_new() / RecordBatch::try_new_with_options)
5a5f41d to
907c293
Compare
|
I've rebased. It seems the consensus is leaning towards making this change now. If there are no other suggestions by Friday I will merge. |
|
🚀 -- thanks @westonpace and everyone else |
|
I am adding an API change label to this PR to make sure it is highlighted in the release notes - I think it is fine to release this in a minor release (and not have to wait for the next major release) |
|
IMO I'd have considered this to be a breaking change, as the In pyo3-arrow I have a test that checks I can import an empty Ideally I'd like to declare my dependency at I figure I'll just bump my requirement to |
|
We can also create a 55.0.1 release and backport the |
|
Given how much work it looks like every time you make a release, I really don't want to make you make a release if you don't need to 😅 In kylebarron/arro3#328 I bumped to require |
|
This also seems to have caused a panic when filtering struct arrays: |
…_new (apache#7247) * Do not default the struct array length to 0 in Struct::try_new if there are no child arrays. * Extend testing for new_empty_fields * Add try_new_with_length
This had been broken by apache/arrow-rs#7247 in Arrow 55.1.0
* Fix ScalarStructBuilder::build() when the struct is empty This had been broken by apache/arrow-rs#7247 in Arrow 55.1.0 * fix test for error
Which issue does this PR close?
Rationale for this change
See PR
What changes are included in this PR?
StructArray::try_newwill now return an error if there are no arrays providedStructArray::newwill panic if there are no arrays providedStructArray::from(vec![])will panicAre there any user-facing changes?
BREAKING CHANGE: StructArray::try_new will now return an error if no child arrays are provided.