[Doc] Add docs to OLAP_SCAN_NODE query profile#3808
Merged
morningman merged 4 commits intoapache:masterfrom Jun 13, 2020
Merged
[Doc] Add docs to OLAP_SCAN_NODE query profile#3808morningman merged 4 commits intoapache:masterfrom
morningman merged 4 commits intoapache:masterfrom
Conversation
EmmyMiao87
requested changes
Jun 12, 2020
|
|
||
| ``` | ||
| OLAP_SCAN_NODE (id=0): (Active: 4.050ms, non-child: 35.68%) | ||
| -BitmapIndexFilterCount: 0 # Number of rows filtered by bitmap index |
Contributor
There was a problem hiding this comment.
You already change this param name above.
|
|
||
| ``` | ||
| OLAP_SCAN_NODE (id=0):(Active: 4.050ms, non-child: 35.68%) | ||
| - BitmapIndexFilterCount: 0 # 利用 bitmap 索引过滤掉的行数。 |
Contributor
There was a problem hiding this comment.
You already change this param name
EmmyMiao87
reviewed
Jun 12, 2020
|
|
||
| 通过以上指标,可以大致分析出存储引擎处理的行数以及最终过滤后的结果行数大小。通过 `Rows***Filtered` 这组指标,也可以分析查询条件是否下推到了存储引擎,以及不同索引的过滤效果。 | ||
|
|
||
| 如果 `RawRowsRead` 和 `RowsRead` 差距较大,则说明大量的行被聚合,而聚合可能比较耗时。如果 `RowsRead` 和 `RowsReturned` 差距较大,则说明很多行在 Scanner 中进行了过滤。这说明很多选择度高的谓词条件并没有推送给存储引擎。而在 Scanner 中的过滤效率会比在存储引擎中过滤效率差。 |
Contributor
There was a problem hiding this comment.
应该是说,有些选择度高的谓词条件没有在存储引擎中通过索引过滤吧?而Scanner 中的过滤效率比存储引擎中利用索引过滤效率差。
| * RowsBloomFilterFiltered | ||
| * RowsStatsFiltered | ||
| * RowsDelFiltered | ||
| * RawRowsRead |
Contributor
Author
There was a problem hiding this comment.
V1 也有,但是v1比较乱,就不写了
| * RawRowsRead | ||
| * RowsRead | ||
| * RowsReturned | ||
|
|
Contributor
There was a problem hiding this comment.
这里是不是应该先总说一下,存储引擎先根据索引过滤数据,然后scan 再过滤一次。然后在存储引擎的部分再把 v1 和 v2分着说。
|
|
||
| OlapScanNode 的 Profile 通常用于分析数据扫描的效率。除了前面介绍的通过行数相关信息可以推断谓词条件下推和索引使用情况外,还可以通过以下几个方面进行简单的分析。 | ||
|
|
||
| * 首先,很多指标,如 `IOTimer`,`BlockFetchTime` 等都是所有 Scanner 线程指标的累加,因此数值可能会比较大。并且因为 Scanner 线程是异步读取数据的,所以这些累加指标只能反映 Scanner 累加的工作时间,并不直接代表 ScanNode 的耗时。ScanNode 在整个查询计划中的耗时占比为 `Active` 字段记录的值。有时会出现比如 `IOTimer` 有几十秒,而 `Active` 实际只有几秒钟。这种情况通常因为:1. `IOTimer` 为多个 Scanner 的累加时间,而 Scanner 数量较多。2. 上层节点比较耗时。比如上层节点耗时 100秒,而底层 ScanNode 只需 10秒。则反映在 `Active` 的字段可能只有几毫秒。因为在上层处理数据的同时,ScanNode 已经异步的进行了数据扫描并准备好了数据。当上层节点从 ScanNode 获取数据时,可以获取到已经准备好的数据,因此 Active 时间很短。 |
|
|
||
| 通过以上指标,可以大致分析出存储引擎处理的行数以及最终过滤后的结果行数大小。通过 `Rows***Filtered` 这组指标,也可以分析查询条件是否下推到了存储引擎,以及不同索引的过滤效果。 | ||
|
|
||
| 如果 `RawRowsRead` 和 `RowsRead` 差距较大,则说明大量的行被聚合,而聚合可能比较耗时。如果 `RowsRead` 和 `RowsReturned` 差距较大,则说明很多行在 Scanner 中进行了过滤。这说明很多选择度高的谓词条件并没有推送给存储引擎。而在 Scanner 中的过滤效率会比在存储引擎中过滤效率差。 |
Contributor
There was a problem hiding this comment.
感觉这一段也像简单分析?要不 直接把说完指标的含义,直接到简单分析?
Contributor
Author
There was a problem hiding this comment.
上面一节是单独描述了下过滤行数相关的说明。
简单分析 主要是从整体做分析。
morningman
added a commit
to morningman/doris
that referenced
this pull request
Jun 22, 2020
csun5285
pushed a commit
to csun5285/doris
that referenced
this pull request
Mar 28, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ISSUE: #3365