Skip to content

[C++] Avoid slicing array inputs in ExecBatchIterator that would result in one slice #31921

@asfimport

Description

@asfimport

For scalar functions, ExecBatchIterator is used to iterate over batches in smaller units. It is implemented by calling {}Array::slice(){}. For small batches, this is unecessary, since only one slice is created. The slice operation still causes some overhead by copying the shrared_ptrs of the ArrayData object, inclung the type pointer, which can lead to contention (ARROW-16161).

This Patch checks if the batch size is smaller than the slice size first, and uses std::move in this case.

I have attached a comparision of the ExecuteScalarExpressionOverhead benchmark here: avoid-slicing-performance.txt 

(created with --benchmark_min_time=20, the standard low runtime tends to be noisy with this, but also shows a positive tendency)

Reporter: Tobias Zagorni / @zagto
Assignee: Tobias Zagorni / @zagto

Related issues:

Original Issue Attachments:

PRs and other links:

Note: This issue was originally created as ARROW-16562. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions