-
Notifications
You must be signed in to change notification settings - Fork 5k
enh(last): enable reading cache when select last and tags #33236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @Tony2h, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the query optimizer to more effectively utilize cached data when Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enhances the last function optimization to enable reading from cache when tag columns are selected. The core logic change is in planOptimizer.c to include FUNCTION_TYPE_GROUP_CONST_VALUE in the last row scan optimization check. A new test case is added to verify this enhancement, which is comprehensive.
I've found a couple of issues:
- The logic change in
planOptimizer.cseems incomplete, asFUNCTION_TYPE_GROUP_CONST_VALUEis not handled in other related functions, which could lead to bugs. - The new test file
test_func_last_tag.pyhas some unused imports.
My detailed comments are below.
| } else if (FUNCTION_TYPE_GROUP_KEY == pAggFunc->funcType || | ||
| FUNCTION_TYPE_GROUP_CONST_VALUE == pAggFunc->funcType) { | ||
| if (!lastRowScanOptLastParaIsTag(pParam)) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this change correctly includes FUNCTION_TYPE_GROUP_CONST_VALUE for the last row scan optimization check, the change seems incomplete. To ensure consistency and prevent potential bugs, FUNCTION_TYPE_GROUP_CONST_VALUE should also be handled in other places where FUNCTION_TYPE_GROUP_KEY is checked for similar purposes.
For example, the function isNeedSplitCacheLastFunc in the same file should be updated. Without this, isNeedSplitCacheLastFunc will return true for FUNCTION_TYPE_GROUP_CONST_VALUE, which could lead to incorrect query plan splitting.
// in isNeedSplitCacheLastFunc() at line 4534
...
FUNCTION_TYPE_SELECT_VALUE != funcType && FUNCTION_TYPE_GROUP_KEY != funcType) {
...This should probably be:
// in isNeedSplitCacheLastFunc()
...
FUNCTION_TYPE_SELECT_VALUE != funcType &&
FUNCTION_TYPE_GROUP_KEY != funcType &&
FUNCTION_TYPE_GROUP_CONST_VALUE != funcType) {
...Additionally, please check if fmIsGroupKeyFunc should also be updated to include FUNCTION_TYPE_GROUP_CONST_VALUE as it is used in several other planner optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| @@ -1,4 +1,4 @@ | |||
| from new_test_framework.utils import tdLog, tdSql, sc, clusterComCheck | |||
| from new_test_framework.utils import tdLog, tdSql, sc, clusterComCheck, tdCom | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
Tony Zhang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enhances the query optimizer to enable reading from cache when the last function is used with tags. The changes in planOptimizer.c correctly extend the conditions for last row scan optimization to include FUNCTION_TYPE_GROUP_CONST_VALUE. A comprehensive test suite has been added to validate this new functionality, covering various scenarios involving last, last_row, and tag columns. The changes are well-implemented. I have one suggestion to improve code readability and fix a potential bug.
| int32_t funcType = pFunc->funcType; | ||
| if ((FUNCTION_TYPE_LAST_ROW != funcType || | ||
| (FUNCTION_TYPE_LAST_ROW == funcType && TSDB_CACHE_MODEL_LAST_VALUE == pScan->cacheLastMode)) && | ||
| (FUNCTION_TYPE_LAST_ROW == funcType && TSDB_CACHE_MODEL_LAST_VALUE == pScan->cacheLastMode)) && | ||
| (FUNCTION_TYPE_LAST != funcType || | ||
| (FUNCTION_TYPE_LAST == funcType && | ||
| (TSDB_CACHE_MODEL_LAST_ROW == pScan->cacheLastMode || | ||
| QUERY_NODE_OPERATOR == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) || | ||
| QUERY_NODE_VALUE == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) || | ||
| COLUMN_TYPE_COLUMN != ((SColumnNode*)nodesListGetNode(pFunc->pParameterList, 0))->colType))) && | ||
| FUNCTION_TYPE_SELECT_VALUE != funcType && FUNCTION_TYPE_GROUP_KEY != funcType) { | ||
| (FUNCTION_TYPE_LAST == funcType && | ||
| (TSDB_CACHE_MODEL_LAST_ROW == pScan->cacheLastMode || | ||
| QUERY_NODE_OPERATOR == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) || | ||
| QUERY_NODE_VALUE == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) || | ||
| COLUMN_TYPE_COLUMN != ((SColumnNode*)nodesListGetNode(pFunc->pParameterList, 0))->colType))) && | ||
| FUNCTION_TYPE_SELECT_VALUE != funcType && | ||
| FUNCTION_TYPE_GROUP_KEY != funcType && | ||
| FUNCTION_TYPE_GROUP_CONST_VALUE != funcType) { | ||
| return true; | ||
| } | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic has a potential bug. The expression COLUMN_TYPE_COLUMN != ((SColumnNode*)nodesListGetNode(pFunc->pParameterList, 0))->colType performs a cast to SColumnNode* without ensuring that the node is actually of type QUERY_NODE_COLUMN. If nodesListGetNode returns a node of a different type (e.g., QUERY_NODE_FUNCTION), this could lead to a crash or undefined behavior due to an invalid cast.
I've also taken the opportunity to refactor the function for better readability and safety by:
- Extracting the result of
nodesListGetNodeinto a local variable to avoid repeated calls. - Adding a
NULLcheck for the parameter. - Ensuring the node type is
QUERY_NODE_COLUMNbefore casting. - Simplifying the
if (condition) return true; else return false;pattern to a directreturn condition;.
int32_t funcType = pFunc->funcType;
SNode* pParam = nodesListGetNode(pFunc->pParameterList, 0);
return (FUNCTION_TYPE_LAST_ROW != funcType ||
(FUNCTION_TYPE_LAST_ROW == funcType && TSDB_CACHE_MODEL_LAST_VALUE == pScan->cacheLastMode)) &&
(FUNCTION_TYPE_LAST != funcType ||
(FUNCTION_TYPE_LAST == funcType &&
(TSDB_CACHE_MODEL_LAST_ROW == pScan->cacheLastMode ||
(pParam != NULL && (nodeType(pParam) == QUERY_NODE_OPERATOR || nodeType(pParam) == QUERY_NODE_VALUE ||
(nodeType(pParam) == QUERY_NODE_COLUMN &&
((SColumnNode*)pParam)->colType != COLUMN_TYPE_COLUMN))))))) &&
FUNCTION_TYPE_SELECT_VALUE != funcType && FUNCTION_TYPE_GROUP_KEY != funcType &&
FUNCTION_TYPE_GROUP_CONST_VALUE != funcType;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enhances the last function to enable reading from cache when tags are also selected. The changes in planOptimizer.c refactor isNeedSplitCacheLastFunc for improved readability and correctness, and extend the logic to handle FUNCTION_TYPE_GROUP_CONST_VALUE. A comprehensive new test case, test_last_tag, is added to validate this new functionality, including checks on the query execution plan. The changes are well-implemented and improve the codebase. I have one minor suggestion to improve the documentation in the new test file.
| Catalog: | ||
| - xxx:xxx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enhances the query planner to enable cache reading for last() queries that also select tag columns, which is a good optimization. The core logic changes in planOptimizer.c are sound. I particularly appreciate the refactoring of the isNeedSplitCacheLastFunc function, which has significantly improved its readability and robustness by fixing a potential null pointer dereference. The new test case test_last_tag is comprehensive and correctly validates the new functionality. Overall, this is a high-quality contribution.
| static bool isNeedSplitCacheLastFunc(SFunctionNode* pFunc, SScanLogicNode* pScan) { | ||
| int32_t funcType = pFunc->funcType; | ||
| if ((FUNCTION_TYPE_LAST_ROW != funcType || | ||
| (FUNCTION_TYPE_LAST_ROW == funcType && TSDB_CACHE_MODEL_LAST_VALUE == pScan->cacheLastMode)) && | ||
| (FUNCTION_TYPE_LAST != funcType || | ||
| (FUNCTION_TYPE_LAST == funcType && | ||
| (TSDB_CACHE_MODEL_LAST_ROW == pScan->cacheLastMode || | ||
| QUERY_NODE_OPERATOR == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) || | ||
| QUERY_NODE_VALUE == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) || | ||
| COLUMN_TYPE_COLUMN != ((SColumnNode*)nodesListGetNode(pFunc->pParameterList, 0))->colType))) && | ||
| FUNCTION_TYPE_SELECT_VALUE != funcType && FUNCTION_TYPE_GROUP_KEY != funcType) { | ||
| return true; | ||
| SNode* pParam = nodesListGetNode(pFunc->pParameterList, 0); | ||
|
|
||
| if (funcType == FUNCTION_TYPE_LAST_ROW) { | ||
| return pScan->cacheLastMode == TSDB_CACHE_MODEL_LAST_VALUE; | ||
| } | ||
| return false; | ||
|
|
||
| if (funcType == FUNCTION_TYPE_LAST) { | ||
| if (pScan->cacheLastMode == TSDB_CACHE_MODEL_LAST_ROW) { | ||
| return true; | ||
| } | ||
| if (pParam != NULL) { | ||
| int type = nodeType(pParam); | ||
| if (type == QUERY_NODE_OPERATOR || type == QUERY_NODE_VALUE) { | ||
| return true; | ||
| } | ||
| if (type == QUERY_NODE_COLUMN && ((SColumnNode*)pParam)->colType != COLUMN_TYPE_COLUMN) { | ||
| return true; | ||
| } | ||
| } | ||
| return false; | ||
| } | ||
|
|
||
| return funcType != FUNCTION_TYPE_SELECT_VALUE && funcType != FUNCTION_TYPE_GROUP_KEY && | ||
| funcType != FUNCTION_TYPE_GROUP_CONST_VALUE; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refactoring of isNeedSplitCacheLastFunc is excellent. The new implementation is much more readable and easier to understand than the previous complex boolean expression. Breaking down the logic into separate if blocks for each function type significantly improves maintainability.
Additionally, the introduction of the pParam variable and the pParam != NULL check is a great improvement for both performance (avoids repeated function calls) and robustness, fixing a potential NULL pointer dereference from the original code.
Great job on improving the code quality here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing to do here
Description
Please briefly describe the code changes in this pull request.
Checklist
Please check the items in the checklist if applicable.