[opt](topn)Optimize the time for topn to lazy materialized reading of external tables#52114
[opt](topn)Optimize the time for topn to lazy materialized reading of external tables#52114morningman merged 4 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 35153 ms |
TPC-DS: Total hot run time: 186100 ms |
ClickBench: Total hot run time: 29.16 s |
|
run buildall |
TPC-H: Total hot run time: 33815 ms |
TPC-DS: Total hot run time: 184906 ms |
ClickBench: Total hot run time: 29.38 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 33885 ms |
TPC-DS: Total hot run time: 184507 ms |
ClickBench: Total hot run time: 30.09 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 34244 ms |
TPC-DS: Total hot run time: 184964 ms |
ClickBench: Total hot run time: 29.91 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
be/src/exec/rowid_fetcher.cpp
Outdated
| } | ||
|
|
||
| { | ||
| stringstream ss; |
There was a problem hiding this comment.
not use stringstream use fmt to get better performance
be/src/exec/rowid_fetcher.cpp
Outdated
| RETURN_IF_ERROR(vfile_scanner_ptr->read_one_line_from_range( | ||
| scan_range_desc, request_block_desc.row_id(j), &result_block, external_info, | ||
| init_reader_ms, get_block_ms)); | ||
| dst_col->insert_range_from(src_col, block_idx, 1); |
There was a problem hiding this comment.
length 1, can use insert_from to do the work。
if want speed up the call, also can call insert_from_multi_column
be/src/pipeline/dependency.cpp
Outdated
| .request = multi_get_request, | ||
| .callback = nullptr, | ||
| .rpc_timer = MonotonicStopWatch()}); | ||
| // if (profile != nullptr) { |
be/src/pipeline/dependency.cpp
Outdated
| return Status::OK(); | ||
| } | ||
|
|
||
| Status MaterializationSharedState::_update_profile_info(int64_t backend_id, |
There was a problem hiding this comment.
the func only return OK, should be void
be/src/pipeline/dependency.cpp
Outdated
| } | ||
| auto& info_map = backend_profile_info_string[backend_id]; | ||
|
|
||
| auto func = [&](const std::string& info_key) -> Status { |
| child.to_proto(proto_counters, child_counter_map); | ||
| } | ||
|
|
||
| // Recurse into children |
be/src/util/runtime_profile.h
Outdated
| } | ||
|
|
||
| PProfileCounter to_proto_peak(const std::string& name) const { | ||
| PProfileCounter counter; |
There was a problem hiding this comment.
same code like to_proto just call to_proto
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 32929 ms |
TPC-DS: Total hot run time: 186311 ms |
ClickBench: Total hot run time: 29.31 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
What problem does this PR solve?
Related PR: #51329
Problem Summary:
Topn lazy materialize was introduced in pr#51329 , but the implementation had performance issues when reading external tables. This pr is used for optimization.
RowIDFetchertoMATERIALIZATION_OPERATOR.The example is as follows:
1FE 2BE
sql :select * from ali_hive.tpch100_orc.lineitem order by l_partkey limit 10;
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)