Skip to content

branch-4.0: [Enhancement](explain)Display deleteFileNum for FileScanNode when explain verbose #60308#60437

Merged
yiguolei merged 1 commit intobranch-4.0from
auto-pick-60308-branch-4.0
Feb 3, 2026
Merged

branch-4.0: [Enhancement](explain)Display deleteFileNum for FileScanNode when explain verbose #60308#60437
yiguolei merged 1 commit intobranch-4.0from
auto-pick-60308-branch-4.0

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Feb 2, 2026

Cherry-picked from #60308

…lain verbose (#60308)

### What problem does this PR solve?
Problem Summary:
This PR enhances the output of EXPLAIN VERBOSE for File Scan nodes by
adding the following metrics:
`dataFileNum=xxx, deleteFileNum=xxx, deleteSplitNum=xxx`
Especially useful for iceberg/paimon/hive acid

These metrics provide more visibility into the underlying file and split
layout, helping users better tune parameters and control query
performance.
Details:
`dataFileNum` : The number of distinct data files that need to be read.
This is not equivalent to the number of splits, since a single data file
can be divided into multiple splits.

`deleteFileNum` : The number of distinct delete files that need to be
read.

`deleteSplitNum` : Added because the relationship between data files and
delete files is many-to-many:
one data file may be associated with multiple delete files
one delete file may apply to multiple data files
Using deleteSplitNum / dataSplitNum, users can estimate the average
number of delete splits that need to be read per data split.

Example:
```
mysql> explain verbose select * from iceberg.format_v3.dv_test_1w;
+-----------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                                               |
+-----------------------------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                                                               |
|   OUTPUT EXPRS:                                                                                                                               |
|     id[#0]                                                                                                                                    |
|     grp[#1]                                                                                                                                   |
|     value[#2]                                                                                                                                 |
|     ts[#3]                                                                                                                                    |
|   PARTITION: RANDOM                                                                                                                           |
|                                                                                                                                               |
|   HAS_COLO_PLAN_NODE: false                                                                                                                   |
|                                                                                                                                               |
|   VRESULT SINK                                                                                                                                |
|      MYSQL_PROTOCOL                                                                                                                           |
|                                                                                                                                               |
|   0:VICEBERG_SCAN_NODE(32)                                                                                                                    |
|      table: iceberg.format_v3.dv_test_1w                                                                                                      |
|      inputSplitNum=220, totalFileSize=720774, scanRanges=220                                                                                  |
|      partition=0/0                                                                                                                            |
|      backends:                                                                                                                                |
|        1769590309070                                                                                                                          |
|          s3://warehouse/wh/format_v3/dv_test_1w/data/00004-51-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 4 length: 2672      |
|          s3://warehouse/wh/format_v3/dv_test_1w/data/00003-50-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 4 length: 2852      |
|          s3://warehouse/wh/format_v3/dv_test_1w/data/00000-47-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 4 length: 2894      |
|          ... other 216 files ...                                                                                                              |
|          s3://warehouse/wh/format_v3/dv_test_1w/data/00001-48-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 58397 length: 13894 |
|          dataFileNum=10, deleteFileNum=1 deleteSplitNum=220                                                                               |
|      cardinality=33334, numNodes=1                                                                                                            |
|      pushdown agg=NONE                                                                                                                        |
|      tuple ids: 0                                                                                                                             |
|                                                                                                                                               |
| Tuples:                                                                                                                                       |
| TupleDescriptor{id=0, tbl=dv_test_1w}                                                                                                         |
|   SlotDescriptor{id=0, col=id, colUniqueId=1, type=bigint, nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null}         |
|   SlotDescriptor{id=1, col=grp, colUniqueId=2, type=int, nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null}           |
|   SlotDescriptor{id=2, col=value, colUniqueId=3, type=int, nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null}         |
|   SlotDescriptor{id=3, col=ts, colUniqueId=4, type=datetimev2(6), nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null}  |
|                                                                                                                                               |
|                                                                                                                                               |
|                                                                                                                                               |
|                                                                                                                                               |
| ========== STATISTICS ==========                                                                                                              |
+-----------------------------------------------------------------------------------------------------------------------------------------------+
```
@github-actions github-actions bot requested a review from yiguolei as a code owner February 2, 2026 08:55
@Thearas
Copy link
Contributor

Thearas commented Feb 2, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring closed this Feb 2, 2026
@dataroaring dataroaring reopened this Feb 2, 2026
@Thearas
Copy link
Contributor

Thearas commented Feb 2, 2026

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/70) 🎉
Increment coverage report
Complete coverage report

@yiguolei
Copy link
Contributor

yiguolei commented Feb 3, 2026

skip buildall

@yiguolei yiguolei merged commit 9c57150 into branch-4.0 Feb 3, 2026
28 of 31 checks passed
@github-actions github-actions bot deleted the auto-pick-60308-branch-4.0 branch February 3, 2026 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments