enh(last): enable reading cache when select last and tags #33236

Tony2h · 2025-10-13T11:18:08Z

Description

Please briefly describe the code changes in this pull request.

Checklist

Please check the items in the checklist if applicable.

Is the user manual updated?
Are the test cases passed and automated?
Is there no significant decrease in test coverage?

gemini-code-assist · 2025-10-13T11:18:25Z

Summary of Changes

Hello @Tony2h, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the query optimizer to more effectively utilize cached data when LAST and LAST_ROW aggregate functions are used in conjunction with selecting tag columns. The primary goal is to improve query performance by ensuring that the system can leverage existing cache mechanisms for these specific types of queries, thereby optimizing data retrieval for time-series data where the latest values and their associated tags are frequently accessed.

Highlights

Query Optimizer Enhancement: The query optimizer's lastRowScanOptCheckFuncList function has been extended to correctly identify and handle FUNCTION_TYPE_GROUP_CONST_VALUE when performing last row scan optimizations, ensuring broader applicability of cache reading.
Enabled Cache Reading for Tag Selections: This change enables the system to read from cache when SELECT LAST or SELECT LAST_ROW queries are executed alongside the selection of tag columns, improving performance for such queries.
New Comprehensive Test Case: A new test case, test_last_tag, has been added to thoroughly verify the behavior of last and last_row functions when used with tag columns and a 'both' cache model, including various query combinations and EXPLAIN plan checks.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request enhances the last function optimization to enable reading from cache when tag columns are selected. The core logic change is in planOptimizer.c to include FUNCTION_TYPE_GROUP_CONST_VALUE in the last row scan optimization check. A new test case is added to verify this enhancement, which is comprehensive.

I've found a couple of issues:

The logic change in planOptimizer.c seems incomplete, as FUNCTION_TYPE_GROUP_CONST_VALUE is not handled in other related functions, which could lead to bugs.
The new test file test_func_last_tag.py has some unused imports.

My detailed comments are below.

gemini-code-assist · 2025-10-13T11:20:42Z

source/libs/planner/src/planOptimizer.c

+    } else if (FUNCTION_TYPE_GROUP_KEY == pAggFunc->funcType ||
+               FUNCTION_TYPE_GROUP_CONST_VALUE == pAggFunc->funcType) {
+      if (!lastRowScanOptLastParaIsTag(pParam)) {
        return false;
      }


While this change correctly includes FUNCTION_TYPE_GROUP_CONST_VALUE for the last row scan optimization check, the change seems incomplete. To ensure consistency and prevent potential bugs, FUNCTION_TYPE_GROUP_CONST_VALUE should also be handled in other places where FUNCTION_TYPE_GROUP_KEY is checked for similar purposes.

For example, the function isNeedSplitCacheLastFunc in the same file should be updated. Without this, isNeedSplitCacheLastFunc will return true for FUNCTION_TYPE_GROUP_CONST_VALUE, which could lead to incorrect query plan splitting.

// in isNeedSplitCacheLastFunc() at line 4534 ... FUNCTION_TYPE_SELECT_VALUE != funcType && FUNCTION_TYPE_GROUP_KEY != funcType) { ...

This should probably be:

// in isNeedSplitCacheLastFunc() ... FUNCTION_TYPE_SELECT_VALUE != funcType && FUNCTION_TYPE_GROUP_KEY != funcType && FUNCTION_TYPE_GROUP_CONST_VALUE != funcType) { ...

Additionally, please check if fmIsGroupKeyFunc should also be updated to include FUNCTION_TYPE_GROUP_CONST_VALUE as it is used in several other planner optimizations.

gemini-code-assist · 2025-10-13T11:20:42Z

test/cases/10-Functions/03-Selection/test_func_last_tag.py

@@ -1,4 +1,4 @@
-from new_test_framework.utils import tdLog, tdSql, sc, clusterComCheck
+from new_test_framework.utils import tdLog, tdSql, sc, clusterComCheck, tdCom


There are unused imports in this line. sc and clusterComCheck are not used in this file. Please remove them to keep the code clean.

Suggested change

from new_test_framework.utils import tdLog, tdSql, sc, clusterComCheck, tdCom

from new_test_framework.utils import tdLog, tdSql, tdCom

CLAassistant · 2025-10-14T03:14:32Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Tony Zhang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Tony2h · 2025-10-14T06:35:08Z

/gemini review

gemini-code-assist

Code Review

This pull request enhances the query optimizer to enable reading from cache when the last function is used with tags. The changes in planOptimizer.c correctly extend the conditions for last row scan optimization to include FUNCTION_TYPE_GROUP_CONST_VALUE. A comprehensive test suite has been added to validate this new functionality, covering various scenarios involving last, last_row, and tag columns. The changes are well-implemented. I have one suggestion to improve code readability and fix a potential bug.

gemini-code-assist · 2025-10-14T06:38:00Z

source/libs/planner/src/planOptimizer.c

  int32_t funcType = pFunc->funcType;
  if ((FUNCTION_TYPE_LAST_ROW != funcType ||
-       (FUNCTION_TYPE_LAST_ROW == funcType && TSDB_CACHE_MODEL_LAST_VALUE == pScan->cacheLastMode)) &&
+        (FUNCTION_TYPE_LAST_ROW == funcType && TSDB_CACHE_MODEL_LAST_VALUE == pScan->cacheLastMode)) &&
      (FUNCTION_TYPE_LAST != funcType ||
-       (FUNCTION_TYPE_LAST == funcType &&
-        (TSDB_CACHE_MODEL_LAST_ROW == pScan->cacheLastMode ||
-         QUERY_NODE_OPERATOR == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) ||
-         QUERY_NODE_VALUE == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) ||
-         COLUMN_TYPE_COLUMN != ((SColumnNode*)nodesListGetNode(pFunc->pParameterList, 0))->colType))) &&
-      FUNCTION_TYPE_SELECT_VALUE != funcType && FUNCTION_TYPE_GROUP_KEY != funcType) {
+        (FUNCTION_TYPE_LAST == funcType &&
+          (TSDB_CACHE_MODEL_LAST_ROW == pScan->cacheLastMode ||
+          QUERY_NODE_OPERATOR == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) ||
+          QUERY_NODE_VALUE == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) ||
+          COLUMN_TYPE_COLUMN != ((SColumnNode*)nodesListGetNode(pFunc->pParameterList, 0))->colType))) &&
+      FUNCTION_TYPE_SELECT_VALUE != funcType &&
+      FUNCTION_TYPE_GROUP_KEY != funcType &&
+      FUNCTION_TYPE_GROUP_CONST_VALUE != funcType) {
    return true;
  }
  return false;


The current logic has a potential bug. The expression COLUMN_TYPE_COLUMN != ((SColumnNode*)nodesListGetNode(pFunc->pParameterList, 0))->colType performs a cast to SColumnNode* without ensuring that the node is actually of type QUERY_NODE_COLUMN. If nodesListGetNode returns a node of a different type (e.g., QUERY_NODE_FUNCTION), this could lead to a crash or undefined behavior due to an invalid cast.

I've also taken the opportunity to refactor the function for better readability and safety by:

Extracting the result of nodesListGetNode into a local variable to avoid repeated calls.

Adding a NULL check for the parameter.

Ensuring the node type is QUERY_NODE_COLUMN before casting.

Simplifying the if (condition) return true; else return false; pattern to a direct return condition;.

int32_t funcType = pFunc->funcType; SNode* pParam = nodesListGetNode(pFunc->pParameterList, 0); return (FUNCTION_TYPE_LAST_ROW != funcType || (FUNCTION_TYPE_LAST_ROW == funcType && TSDB_CACHE_MODEL_LAST_VALUE == pScan->cacheLastMode)) && (FUNCTION_TYPE_LAST != funcType || (FUNCTION_TYPE_LAST == funcType && (TSDB_CACHE_MODEL_LAST_ROW == pScan->cacheLastMode || (pParam != NULL && (nodeType(pParam) == QUERY_NODE_OPERATOR || nodeType(pParam) == QUERY_NODE_VALUE || (nodeType(pParam) == QUERY_NODE_COLUMN && ((SColumnNode*)pParam)->colType != COLUMN_TYPE_COLUMN))))))) && FUNCTION_TYPE_SELECT_VALUE != funcType && FUNCTION_TYPE_GROUP_KEY != funcType && FUNCTION_TYPE_GROUP_CONST_VALUE != funcType;

Tony2h · 2025-10-14T06:47:59Z

/gemini review

gemini-code-assist

Code Review

This pull request enhances the last function to enable reading from cache when tags are also selected. The changes in planOptimizer.c refactor isNeedSplitCacheLastFunc for improved readability and correctness, and extend the logic to handle FUNCTION_TYPE_GROUP_CONST_VALUE. A comprehensive new test case, test_last_tag, is added to validate this new functionality, including checks on the query execution plan. The changes are well-implemented and improve the codebase. I have one minor suggestion to improve the documentation in the new test file.

gemini-code-assist · 2025-10-14T06:49:35Z

test/cases/10-Functions/03-Selection/test_func_last_tag.py

+        Catalog:
+            - xxx:xxx


The Catalog entry in the docstring is using a placeholder xxx:xxx. Please update this with the correct catalog information for better test documentation and maintainability.

Tony2h · 2025-10-14T06:57:15Z

/gemini review

gemini-code-assist

Code Review

This pull request enhances the query planner to enable cache reading for last() queries that also select tag columns, which is a good optimization. The core logic changes in planOptimizer.c are sound. I particularly appreciate the refactoring of the isNeedSplitCacheLastFunc function, which has significantly improved its readability and robustness by fixing a potential null pointer dereference. The new test case test_last_tag is comprehensive and correctly validates the new functionality. Overall, this is a high-quality contribution.

gemini-code-assist · 2025-10-14T06:58:58Z

source/libs/planner/src/planOptimizer.c

 static bool isNeedSplitCacheLastFunc(SFunctionNode* pFunc, SScanLogicNode* pScan) {
  int32_t funcType = pFunc->funcType;
-  if ((FUNCTION_TYPE_LAST_ROW != funcType ||
-       (FUNCTION_TYPE_LAST_ROW == funcType && TSDB_CACHE_MODEL_LAST_VALUE == pScan->cacheLastMode)) &&
-      (FUNCTION_TYPE_LAST != funcType ||
-       (FUNCTION_TYPE_LAST == funcType &&
-        (TSDB_CACHE_MODEL_LAST_ROW == pScan->cacheLastMode ||
-         QUERY_NODE_OPERATOR == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) ||
-         QUERY_NODE_VALUE == nodeType(nodesListGetNode(pFunc->pParameterList, 0)) ||
-         COLUMN_TYPE_COLUMN != ((SColumnNode*)nodesListGetNode(pFunc->pParameterList, 0))->colType))) &&
-      FUNCTION_TYPE_SELECT_VALUE != funcType && FUNCTION_TYPE_GROUP_KEY != funcType) {
-    return true;
+  SNode*  pParam = nodesListGetNode(pFunc->pParameterList, 0);
+
+  if (funcType == FUNCTION_TYPE_LAST_ROW) {
+    return pScan->cacheLastMode == TSDB_CACHE_MODEL_LAST_VALUE;
  }
-  return false;
+
+  if (funcType == FUNCTION_TYPE_LAST) {
+    if (pScan->cacheLastMode == TSDB_CACHE_MODEL_LAST_ROW) {
+      return true;
+    }
+    if (pParam != NULL) {
+      int type = nodeType(pParam);
+      if (type == QUERY_NODE_OPERATOR || type == QUERY_NODE_VALUE) {
+        return true;
+      }
+      if (type == QUERY_NODE_COLUMN && ((SColumnNode*)pParam)->colType != COLUMN_TYPE_COLUMN) {
+        return true;
+      }
+    }
+    return false;
+  }
+
+  return funcType != FUNCTION_TYPE_SELECT_VALUE && funcType != FUNCTION_TYPE_GROUP_KEY &&
+         funcType != FUNCTION_TYPE_GROUP_CONST_VALUE;
 }


This refactoring of isNeedSplitCacheLastFunc is excellent. The new implementation is much more readable and easier to understand than the previous complex boolean expression. Breaking down the logic into separate if blocks for each function type significantly improves maintainability.

Additionally, the introduction of the pParam variable and the pParam != NULL check is a great improvement for both performance (avoids repeated function calls) and robustness, fixing a potential NULL pointer dereference from the original code.

Great job on improving the code quality here!

nothing to do here

enh(last): enable reading cache when select last and tags

d2af95e

Tony2h requested review from a team, dapan1121, guanshengliang and hzcheng as code owners October 13, 2025 11:18

taosdata-bot bot added the team engine label Oct 13, 2025

gemini-code-assist bot reviewed Oct 13, 2025

View reviewed changes

handle tag cols correctly when split last cache functions

2621b69

format pytest summary message

b342914

gemini-code-assist bot reviewed Oct 14, 2025

View reviewed changes

simplify the condition for splitting cache last functions

38f8f11

gemini-code-assist bot reviewed Oct 14, 2025

View reviewed changes

update pytest catalog

e18ab28

gemini-code-assist bot reviewed Oct 14, 2025

View reviewed changes

guanshengliang approved these changes Oct 14, 2025

View reviewed changes

guanshengliang merged commit 648a6d4 into 3.3.6 Oct 14, 2025
13 of 14 checks passed

Tony2h deleted the enh/3.3.6/TS-6146 branch October 17, 2025 09:39

		@@ -1,4 +1,4 @@
		from new_test_framework.utils import tdLog, tdSql, sc, clusterComCheck
		from new_test_framework.utils import tdLog, tdSql, sc, clusterComCheck, tdCom

enh(last): enable reading cache when select last and tags #33236

enh(last): enable reading cache when select last and tags #33236

Uh oh!

Conversation

Tony2h commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

gemini-code-assist bot commented Oct 13, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Tony2h Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Tony2h Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Oct 14, 2025

Uh oh!

Tony2h commented Oct 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Tony2h Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Tony2h commented Oct 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Tony2h Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Tony2h commented Oct 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Tony2h Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Tony2h commented Oct 13, 2025 •

edited

Loading