[fix](nereids) Fix the expr id are same but different expr when agg table with random distribute#52993
Merged
starocean999 merged 2 commits intoapache:masterfrom Jul 18, 2025
Merged
Conversation
…able with random distribute
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
morrySnow
reviewed
Jul 9, 2025
| } | ||
| Alias alias = new Alias(exprId, ImmutableList.of(function), col.getName(), | ||
| olapScan.qualified(), true); | ||
| Alias alias = new Alias(StatementScopeIdGenerator.newExprId(), ImmutableList.of(function), |
Contributor
There was a problem hiding this comment.
add a new ctor for Alias accept four args as children, name, qualifier and nameFromChild
TPC-H: Total hot run time: 33745 ms |
TPC-DS: Total hot run time: 186196 ms |
ClickBench: Total hot run time: 29.67 s |
Contributor
FE UT Coverage ReportIncrement line coverage |
morrySnow
approved these changes
Jul 18, 2025
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
starocean999
approved these changes
Jul 18, 2025
github-actions bot
pushed a commit
that referenced
this pull request
Jul 18, 2025
…able with random distribute (#52993) If agg table is random hash distribute, would add aggregate node on scan. The aggregate function alias expr id is same to the child expr id of alias. such as query sql is `select * from db1.tagg` the query plan is as following, and the `sum(b#1) AS `b`#1`, alias expr id is same to the child expr id of alias, the id is 1 this would cause hidden problems ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#1] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#1], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042452160, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ``` the pr fix this, and the expression change to `sum(b#1) AS `b`#4` ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#4] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#4], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042065062, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ```
github-actions bot
pushed a commit
that referenced
this pull request
Jul 18, 2025
…able with random distribute (#52993) If agg table is random hash distribute, would add aggregate node on scan. The aggregate function alias expr id is same to the child expr id of alias. such as query sql is `select * from db1.tagg` the query plan is as following, and the `sum(b#1) AS `b`#1`, alias expr id is same to the child expr id of alias, the id is 1 this would cause hidden problems ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#1] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#1], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042452160, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ``` the pr fix this, and the expression change to `sum(b#1) AS `b`#4` ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#4] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#4], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042065062, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ```
github-actions bot
pushed a commit
that referenced
this pull request
Jul 18, 2025
…able with random distribute (#52993) If agg table is random hash distribute, would add aggregate node on scan. The aggregate function alias expr id is same to the child expr id of alias. such as query sql is `select * from db1.tagg` the query plan is as following, and the `sum(b#1) AS `b`#1`, alias expr id is same to the child expr id of alias, the id is 1 this would cause hidden problems ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#1] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#1], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042452160, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ``` the pr fix this, and the expression change to `sum(b#1) AS `b`#4` ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#4] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#4], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042065062, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ```
seawinde
added a commit
to seawinde/doris
that referenced
this pull request
Jul 21, 2025
…able with random distribute (apache#52993) If agg table is random hash distribute, would add aggregate node on scan. The aggregate function alias expr id is same to the child expr id of alias. such as query sql is `select * from db1.tagg` the query plan is as following, and the `sum(b#1) AS `b`#1`, alias expr id is same to the child expr id of alias, the id is 1 this would cause hidden problems ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#1] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#1], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042452160, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ``` the pr fix this, and the expression change to `sum(b#1) AS `b`apache#4` ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#4] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`apache#4], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042065062, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ```
seawinde
added a commit
to seawinde/doris
that referenced
this pull request
Jul 21, 2025
…able with random distribute (apache#52993) If agg table is random hash distribute, would add aggregate node on scan. The aggregate function alias expr id is same to the child expr id of alias. such as query sql is `select * from db1.tagg` the query plan is as following, and the `sum(b#1) AS `b`#1`, alias expr id is same to the child expr id of alias, the id is 1 this would cause hidden problems ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#1] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#1], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042452160, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ``` the pr fix this, and the expression change to `sum(b#1) AS `b`apache#4` ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#4] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`apache#4], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042065062, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ```
seawinde
added a commit
to seawinde/doris
that referenced
this pull request
Jul 21, 2025
…able with random distribute (apache#52993) If agg table is random hash distribute, would add aggregate node on scan. The aggregate function alias expr id is same to the child expr id of alias. such as query sql is `select * from db1.tagg` the query plan is as following, and the `sum(b#1) AS `b`#1`, alias expr id is same to the child expr id of alias, the id is 1 this would cause hidden problems ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#1] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#1], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042452160, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ``` the pr fix this, and the expression change to `sum(b#1) AS `b`apache#4` ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#4] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`apache#4], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042065062, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
If agg table is random hash distribute, would add aggregate node on scan.
The aggregate function alias expr id is same to the child expr id of alias.
such as query sql is
select * from db1.taggthe query plan is as following, and the
sum(b#1) ASb#1, alias expr id is same to the child expr id of alias, the id is 1this would cause hidden problems
the pr fix this, and the expression change to
sum(b#1) ASb#4Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)