Combine evaluate_stateful and evaluate_inside_range#6665

mustafasrepo · 2023-06-14T12:21:46Z

Which issue does this PR close?

Related to #5781

Rationale for this change

What changes are included in this PR?

This PR implements the proposal in the #PR12

evaluate_stateful and evaluate_inside_range are combined under single evaluate methods
uses_window_frame and supports_bounded_execution methods are moved from BoundedWindowFunctionExpr to PartitionEvaluator.

With the changes in this PR new Evaluators can be implemented according to table below.

uses_window_frame	supports_bounded_execution	function_to_implement
false	false	`evaluate_all` (if we were to implement `PERCENT_RANK` it would end up in this quadrant, we cannot produce any result without seeing whole data)
false	true	`evaluate` (optionally can also implement `evaluate_all` for more optimized implementation. However, there will be default implementation that is suboptimal) . If we were to implement `ROW_NUMBER` it will end up in this quadrant. Example `OddRowNumber` showcases this use case
true	false	`evaluate` (I think as long as `uses_window_frame` is `true`. There is no way for `supports_bounded_execution` to be false). I couldn't come up with any example for this quadrant
true	true	`evaluate`. If we were to implement `FIRST_VALUE`, it would end up in this quadrant

Are these changes tested?

Are there any user-facing changes?

alamb

This makes sense to me @mustafasrepo and is a very nice cleanup. Thank you!

alamb · 2023-06-14T13:05:47Z

datafusion/physical-expr/src/window/built_in_window_function_expr.rs

    ///
    /// If `include_rank` is true, then [`Self::create_evaluator`] must
-    /// implement [`PartitionEvaluator::evaluate_with_rank`]
+    /// implement [`PartitionEvaluator::evaluate_with_rank_all`]


Do you plan to give the same treatment (move to PartitionEvaluator) to include_rank?

We can do so, what do you think about it. Should we do that?

For consistency, I have moved include_rank flag to PartitionEvaluator also.

alamb · 2023-06-14T13:08:36Z

datafusion/physical-expr/src/window/partition_evaluator.rs

+            ScalarValue::iter_to_array(res.into_iter())
+        } else {
+            Err(DataFusionError::NotImplemented(
+                "evaluate_all is not implemented by default".into(),


Suggested change

"evaluate_all is not implemented by default".into(),

format!("evaluate_all is not implemented for {} when using window frames", self.name()),

I think the error could be more helpful, but I can also add that as a follow on PR

alamb

Looks great to me -- thanks again @mustafasrepo

alamb · 2023-06-14T20:05:12Z

datafusion/physical-expr/src/window/rank.rs

 }

 impl PartitionEvaluator for RankEvaluator {
-    fn get_range(&self, idx: usize, _n_rows: usize) -> Result<Range<usize>> {


mustafasrepo added 2 commits June 14, 2023 09:36

Combine evaluate_stateful and evaluate_inside_range

1547593

Move flags to partition_evaluator trait

3a77901

github-actions bot added the physical-expr Changes to the physical-expr crates label Jun 14, 2023

Update comments

e463b44

mustafasrepo mentioned this pull request Jun 14, 2023

Proposed changes for more flexible user defined Aggregate and window functions alamb/datafusion#12

Closed

Update PartitionEvaluator comment

b3be67e

alamb approved these changes Jun 14, 2023

View reviewed changes

mustafasrepo added 2 commits June 14, 2023 16:42

move include_rank to partition_evaluator

55b4a4e

Default implement get_range when window frame is not used.

105f7fa

alamb approved these changes Jun 14, 2023

View reviewed changes

alamb merged commit 4184a7f into apache:main Jun 14, 2023

mustafasrepo deleted the feature/refactor_window_eval branch July 25, 2023 05:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combine evaluate_stateful and evaluate_inside_range#6665