Skip to content

Conversation

@Tony2h
Copy link
Contributor

@Tony2h Tony2h commented Oct 13, 2025

Description

Please briefly describe the code changes in this pull request.

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Tony2h, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the query optimizer to more effectively utilize cached data when LAST and LAST_ROW aggregate functions are used in conjunction with selecting tag columns. The primary goal is to improve query performance by ensuring that the system can leverage existing cache mechanisms for these specific types of queries, thereby optimizing data retrieval for time-series data where the latest values and their associated tags are frequently accessed.

Highlights

  • Query Optimizer Enhancement: The query optimizer's lastRowScanOptCheckFuncList function has been extended to correctly identify and handle FUNCTION_TYPE_GROUP_CONST_VALUE when performing last row scan optimizations, ensuring broader applicability of cache reading.
  • Enabled Cache Reading for Tag Selections: This change enables the system to read from cache when SELECT LAST or SELECT LAST_ROW queries are executed alongside the selection of tag columns, improving performance for such queries.
  • New Comprehensive Test Case: A new test case, test_last_tag, has been added to thoroughly verify the behavior of last and last_row functions when used with tag columns and a 'both' cache model, including various query combinations and EXPLAIN plan checks.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the last function optimization to enable reading from cache when tag columns are selected. The core logic change is in planOptimizer.c to include FUNCTION_TYPE_GROUP_CONST_VALUE in the last row scan optimization check. A new test case is added to verify this enhancement, which is comprehensive.

I've found a couple of issues:

  1. The logic change in planOptimizer.c seems incomplete, as FUNCTION_TYPE_GROUP_CONST_VALUE is not handled in other related functions, which could lead to bugs.
  2. The new test file test_func_last_tag.py has some unused imports.

My detailed comments are below.

Comment on lines +4587 to 4591
} else if (FUNCTION_TYPE_GROUP_KEY == pAggFunc->funcType ||
FUNCTION_TYPE_GROUP_CONST_VALUE == pAggFunc->funcType) {
if (!lastRowScanOptLastParaIsTag(pParam)) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While this change correctly includes FUNCTION_TYPE_GROUP_CONST_VALUE for the last row scan optimization check, the change seems incomplete. To ensure consistency and prevent potential bugs, FUNCTION_TYPE_GROUP_CONST_VALUE should also be handled in other places where FUNCTION_TYPE_GROUP_KEY is checked for similar purposes.

For example, the function isNeedSplitCacheLastFunc in the same file should be updated. Without this, isNeedSplitCacheLastFunc will return true for FUNCTION_TYPE_GROUP_CONST_VALUE, which could lead to incorrect query plan splitting.

// in isNeedSplitCacheLastFunc() at line 4534
...
     FUNCTION_TYPE_SELECT_VALUE != funcType && FUNCTION_TYPE_GROUP_KEY != funcType) {
...

This should probably be:

// in isNeedSplitCacheLastFunc()
...
     FUNCTION_TYPE_SELECT_VALUE != funcType && 
     FUNCTION_TYPE_GROUP_KEY != funcType &&
     FUNCTION_TYPE_GROUP_CONST_VALUE != funcType) {
...

Additionally, please check if fmIsGroupKeyFunc should also be updated to include FUNCTION_TYPE_GROUP_CONST_VALUE as it is used in several other planner optimizations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -1,4 +1,4 @@
from new_test_framework.utils import tdLog, tdSql, sc, clusterComCheck
from new_test_framework.utils import tdLog, tdSql, sc, clusterComCheck, tdCom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are unused imports in this line. sc and clusterComCheck are not used in this file. Please remove them to keep the code clean.

Suggested change
from new_test_framework.utils import tdLog, tdSql, sc, clusterComCheck, tdCom
from new_test_framework.utils import tdLog, tdSql, tdCom

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Tony Zhang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@Tony2h
Copy link
Contributor Author

Tony2h commented Oct 14, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the query optimizer to enable reading from cache when the last function is used with tags. The changes in planOptimizer.c correctly extend the conditions for last row scan optimization to include FUNCTION_TYPE_GROUP_CONST_VALUE. A comprehensive test suite has been added to validate this new functionality, covering various scenarios involving last, last_row, and tag columns. The changes are well-implemented. I have one suggestion to improve code readability and fix a potential bug.

Comment on lines 4525 to 4539
int32_t funcType = pFunc->funcType;
if ((FUNCTION_TYPE_LAST_ROW != funcType ||
(FUNCTION_TYPE_LAST_ROW == funcType && TSDB_CACHE_MODEL_LAST_VALUE == pScan->cacheLastMode)) &&
(FUNCTION_TYPE_LAST_ROW == funcType && TSDB_CACHE_MODEL_LAST_VALUE == pScan->cacheLastMode)) &&
(FUNCTION_TYPE_LAST != funcType ||
(FUNCTION_TYPE_LAST == funcType &&
(TSDB_CACHE_MODEL_LAST_ROW == pScan->cacheLastMode ||
QUERY_NODE_OPERATOR == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) ||
QUERY_NODE_VALUE == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) ||
COLUMN_TYPE_COLUMN != ((SColumnNode*)nodesListGetNode(pFunc->pParameterList, 0))->colType))) &&
FUNCTION_TYPE_SELECT_VALUE != funcType && FUNCTION_TYPE_GROUP_KEY != funcType) {
(FUNCTION_TYPE_LAST == funcType &&
(TSDB_CACHE_MODEL_LAST_ROW == pScan->cacheLastMode ||
QUERY_NODE_OPERATOR == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) ||
QUERY_NODE_VALUE == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) ||
COLUMN_TYPE_COLUMN != ((SColumnNode*)nodesListGetNode(pFunc->pParameterList, 0))->colType))) &&
FUNCTION_TYPE_SELECT_VALUE != funcType &&
FUNCTION_TYPE_GROUP_KEY != funcType &&
FUNCTION_TYPE_GROUP_CONST_VALUE != funcType) {
return true;
}
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic has a potential bug. The expression COLUMN_TYPE_COLUMN != ((SColumnNode*)nodesListGetNode(pFunc->pParameterList, 0))->colType performs a cast to SColumnNode* without ensuring that the node is actually of type QUERY_NODE_COLUMN. If nodesListGetNode returns a node of a different type (e.g., QUERY_NODE_FUNCTION), this could lead to a crash or undefined behavior due to an invalid cast.

I've also taken the opportunity to refactor the function for better readability and safety by:

  1. Extracting the result of nodesListGetNode into a local variable to avoid repeated calls.
  2. Adding a NULL check for the parameter.
  3. Ensuring the node type is QUERY_NODE_COLUMN before casting.
  4. Simplifying the if (condition) return true; else return false; pattern to a direct return condition;.
  int32_t funcType = pFunc->funcType;
  SNode*  pParam = nodesListGetNode(pFunc->pParameterList, 0);

  return (FUNCTION_TYPE_LAST_ROW != funcType ||
          (FUNCTION_TYPE_LAST_ROW == funcType && TSDB_CACHE_MODEL_LAST_VALUE == pScan->cacheLastMode)) &&
         (FUNCTION_TYPE_LAST != funcType ||
          (FUNCTION_TYPE_LAST == funcType &&
           (TSDB_CACHE_MODEL_LAST_ROW == pScan->cacheLastMode ||
            (pParam != NULL && (nodeType(pParam) == QUERY_NODE_OPERATOR || nodeType(pParam) == QUERY_NODE_VALUE ||
                              (nodeType(pParam) == QUERY_NODE_COLUMN &&
                               ((SColumnNode*)pParam)->colType != COLUMN_TYPE_COLUMN))))))) &&
         FUNCTION_TYPE_SELECT_VALUE != funcType && FUNCTION_TYPE_GROUP_KEY != funcType &&
         FUNCTION_TYPE_GROUP_CONST_VALUE != funcType;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@Tony2h
Copy link
Contributor Author

Tony2h commented Oct 14, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the last function to enable reading from cache when tags are also selected. The changes in planOptimizer.c refactor isNeedSplitCacheLastFunc for improved readability and correctness, and extend the logic to handle FUNCTION_TYPE_GROUP_CONST_VALUE. A comprehensive new test case, test_last_tag, is added to validate this new functionality, including checks on the query execution plan. The changes are well-implemented and improve the codebase. I have one minor suggestion to improve the documentation in the new test file.

Comment on lines 194 to 195
Catalog:
- xxx:xxx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Catalog entry in the docstring is using a placeholder xxx:xxx. Please update this with the correct catalog information for better test documentation and maintainability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@Tony2h
Copy link
Contributor Author

Tony2h commented Oct 14, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the query planner to enable cache reading for last() queries that also select tag columns, which is a good optimization. The core logic changes in planOptimizer.c are sound. I particularly appreciate the refactoring of the isNeedSplitCacheLastFunc function, which has significantly improved its readability and robustness by fixing a potential null pointer dereference. The new test case test_last_tag is comprehensive and correctly validates the new functionality. Overall, this is a high-quality contribution.

Comment on lines 4524 to 4550
static bool isNeedSplitCacheLastFunc(SFunctionNode* pFunc, SScanLogicNode* pScan) {
int32_t funcType = pFunc->funcType;
if ((FUNCTION_TYPE_LAST_ROW != funcType ||
(FUNCTION_TYPE_LAST_ROW == funcType && TSDB_CACHE_MODEL_LAST_VALUE == pScan->cacheLastMode)) &&
(FUNCTION_TYPE_LAST != funcType ||
(FUNCTION_TYPE_LAST == funcType &&
(TSDB_CACHE_MODEL_LAST_ROW == pScan->cacheLastMode ||
QUERY_NODE_OPERATOR == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) ||
QUERY_NODE_VALUE == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) ||
COLUMN_TYPE_COLUMN != ((SColumnNode*)nodesListGetNode(pFunc->pParameterList, 0))->colType))) &&
FUNCTION_TYPE_SELECT_VALUE != funcType && FUNCTION_TYPE_GROUP_KEY != funcType) {
return true;
SNode* pParam = nodesListGetNode(pFunc->pParameterList, 0);

if (funcType == FUNCTION_TYPE_LAST_ROW) {
return pScan->cacheLastMode == TSDB_CACHE_MODEL_LAST_VALUE;
}
return false;

if (funcType == FUNCTION_TYPE_LAST) {
if (pScan->cacheLastMode == TSDB_CACHE_MODEL_LAST_ROW) {
return true;
}
if (pParam != NULL) {
int type = nodeType(pParam);
if (type == QUERY_NODE_OPERATOR || type == QUERY_NODE_VALUE) {
return true;
}
if (type == QUERY_NODE_COLUMN && ((SColumnNode*)pParam)->colType != COLUMN_TYPE_COLUMN) {
return true;
}
}
return false;
}

return funcType != FUNCTION_TYPE_SELECT_VALUE && funcType != FUNCTION_TYPE_GROUP_KEY &&
funcType != FUNCTION_TYPE_GROUP_CONST_VALUE;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This refactoring of isNeedSplitCacheLastFunc is excellent. The new implementation is much more readable and easier to understand than the previous complex boolean expression. Breaking down the logic into separate if blocks for each function type significantly improves maintainability.

Additionally, the introduction of the pParam variable and the pParam != NULL check is a great improvement for both performance (avoids repeated function calls) and robustness, fixing a potential NULL pointer dereference from the original code.

Great job on improving the code quality here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing to do here

@guanshengliang guanshengliang merged commit 648a6d4 into 3.3.6 Oct 14, 2025
13 of 14 checks passed
@Tony2h Tony2h deleted the enh/3.3.6/TS-6146 branch October 17, 2025 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants