Skip to content

[Fix](TabletHotspot) Fix race in TabletHotspot::get_top_n_hot_partition#60607

Merged
dataroaring merged 1 commit intoapache:masterfrom
bobhan1:fix-get_top_n_hot_partition
Feb 9, 2026
Merged

[Fix](TabletHotspot) Fix race in TabletHotspot::get_top_n_hot_partition#60607
dataroaring merged 1 commit intoapache:masterfrom
bobhan1:fix-get_top_n_hot_partition

Conversation

@bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Feb 9, 2026

What problem does this PR solve?

 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/selectdb-core/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo_t*, void*) [clone .part.0] at src/hotspot/os/posix/signals_posix.cpp:454
 2# JVM_handle_linux_signal at src/hotspot/os/posix/signals_posix.cpp:641
 3# 0x00007F6D2FBC6520 in /lib/x86_64-linux-gnu/libc.so.6
 4# je_tcache_bin_flush_small at /root/selectdb-core/thirdparty/src/jemalloc-5.3.0/doris_build/../src/tcache.c:529
 5# je_free_default at /root/selectdb-core/thirdparty/src/jemalloc-5.3.0/doris_build/../src/jemalloc.c:3014
 6# std::_Hashtable<std::pair<long, long>, std::pair<std::pair<long, long> const, std::unordered_map<long, doris::TabletHotspotMapValue, std::hash<long>, std::equal_to<long>, std::allocator<std::pair<long co
nst, doris::TabletHotspotMapValue> > > >, std::allocator<std::pair<std::pair<long, long> const, std::unordered_map<long, doris::TabletHotspotMapValue, std::hash<long>, std::equal_to<long>, std::allocator<std
::pair<long const, doris::TabletHotspotMapValue> > > > >, std::__detail::_Select1st, std::equal_to<std::pair<long, long> >, doris::MapKeyHash, std::__detail::_Mod_range_hashing, std::__detail::_Default_range
d_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_move_assign(std::_Hashtable<std::pair<long, long>, std::pair<std::pair<long, long> const, std::unordered
_map<long, doris::TabletHotspotMapValue, std::hash<long>, std::equal_to<long>, std::allocator<std::pair<long const, doris::TabletHotspotMapValue> > > >, std::allocator<std::pair<std::pair<long, long> const, 
std::unordered_map<long, doris::TabletHotspotMapValue, std::hash<long>, std::equal_to<long>, std::allocator<std::pair<long const, doris::TabletHotspotMapValue> > > > >, std::__detail::_Select1st, std::equal_
to<std::pair<long, long> >, doris::MapKeyHash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>
 >&&, std::integral_constant<bool, true>) at /root/tools/ldb-16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/hashtable.h:1342
 7# doris::TabletHotspot::get_top_n_hot_partition(std::vector<doris::THotTableMessage, std::allocator<doris::THotTableMessage> >*) in /opt/selectdb/be/lib/doris_be
 8# doris::CloudBackendService::get_top_n_hot_partitions(doris::TGetTopNHotPartitionsResponse&, doris::TGetTopNHotPartitionsRequest const&) at /root/selectdb-core/be/src/cloud/cloud_backend_service.cpp:86
 9# doris::BackendServiceProcessor::process_get_top_n_hot_partitions(int, apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, void*) at /root/selectdb-core/gensrc/build/gen_cpp/BackendService.cpp:8102
10# doris::BackendServiceProcessor::dispatchCall(apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, void*) in /opt/selectdb/be/lib/doris_be
11# apache::thrift::TDispatchProcessor::process(std::shared_ptr<apache::thrift::protocol::TProtocol>, std::shared_ptr<apache::thrift::protocol::TProtocol>, void*) in /opt/selectdb/be/lib/doris_be
12# apache::thrift::server::TConnectedClient::run() in /opt/selectdb/be/lib/doris_be
13# apache::thrift::server::TThreadedServer::TConnectedClientRunner::run() in /opt/selectdb/be/lib/doris_be
14# apache::thrift::concurrency::Thread::threadMain(std::shared_ptr<apache::thrift::concurrency::Thread>) in /opt/selectdb/be/lib/doris_be
15# void std::__invoke_impl<void, void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> >(std::__invoke_other, void (*&&)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread>&&) in /opt/selectdb/be/lib/doris_be
16# std::__invoke_result<void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> >::type std::__invoke<void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> >(void (*&&)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread>&&) in /opt/selectdb/be/lib/doris_be
17# void std::thread::_Invoker<std::tuple<void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> > >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) in /opt/selectdb/be/lib/doris_be
18# std::thread::_Invoker<std::tuple<void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> > >::operator()() in /opt/selectdb/be/lib/doris_be
19# std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> > > >::_M_run() in /opt/sele
ctdb/be/lib/doris_be
20# execute_native_thread_routine at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
21# 0x00007F6D2FC18AC3 in /lib/x86_64-linux-gnu/libc.so.6
22# __clone in /lib/x86_64-linux-gnu/libc.so.6 

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1
Copy link
Contributor Author

bobhan1 commented Feb 9, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 30523 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ea0d63ad330cb8fd944240d19d87f48ac773bba9, data reload: false

------ Round 1 ----------------------------------
q1	17613	4437	4297	4297
q2	2061	358	246	246
q3	10108	1364	741	741
q4	10199	783	319	319
q5	7507	2219	1944	1944
q6	204	176	149	149
q7	924	751	618	618
q8	9286	1470	1171	1171
q9	4729	4664	4616	4616
q10	6826	1958	1558	1558
q11	521	316	292	292
q12	376	385	237	237
q13	17774	4097	3265	3265
q14	236	230	214	214
q15	898	841	799	799
q16	679	689	628	628
q17	726	846	531	531
q18	6440	5857	5818	5818
q19	1125	1007	628	628
q20	508	502	381	381
q21	2568	1876	1793	1793
q22	362	320	278	278
Total cold run time: 101670 ms
Total hot run time: 30523 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4417	4368	4389	4368
q2	260	346	273	273
q3	2110	2676	2175	2175
q4	1364	1731	1323	1323
q5	4367	4273	4290	4273
q6	222	185	139	139
q7	1893	1836	1671	1671
q8	2586	2708	2485	2485
q9	7653	7668	7432	7432
q10	2834	3116	2642	2642
q11	542	477	453	453
q12	705	747	610	610
q13	3953	4418	3729	3729
q14	305	340	300	300
q15	897	882	799	799
q16	681	733	708	708
q17	1177	1301	1314	1301
q18	8262	7992	8065	7992
q19	869	889	872	872
q20	2234	2133	1954	1954
q21	4925	4612	4707	4612
q22	609	589	543	543
Total cold run time: 52865 ms
Total hot run time: 50654 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.59 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ea0d63ad330cb8fd944240d19d87f48ac773bba9, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.05	0.05
query3	0.26	0.08	0.08
query4	1.61	0.12	0.11
query5	0.27	0.26	0.25
query6	1.17	0.68	0.67
query7	0.03	0.04	0.03
query8	0.05	0.04	0.04
query9	0.58	0.50	0.52
query10	0.55	0.56	0.55
query11	0.15	0.10	0.10
query12	0.15	0.11	0.10
query13	0.63	0.62	0.60
query14	1.06	1.05	1.07
query15	0.89	0.87	0.89
query16	0.40	0.38	0.38
query17	1.13	1.15	1.15
query18	0.24	0.22	0.22
query19	2.11	2.01	2.06
query20	0.02	0.02	0.02
query21	15.44	0.26	0.15
query22	5.48	0.06	0.06
query23	16.27	0.29	0.11
query24	2.22	0.26	0.25
query25	0.06	0.08	0.08
query26	0.14	0.14	0.13
query27	0.07	0.05	0.09
query28	3.56	1.15	0.97
query29	12.54	3.93	3.18
query30	0.31	0.16	0.13
query31	2.83	0.63	0.41
query32	3.25	0.60	0.51
query33	3.23	3.33	3.22
query34	16.21	5.45	4.82
query35	4.86	4.84	4.77
query36	0.67	0.52	0.50
query37	0.12	0.07	0.08
query38	0.08	0.05	0.05
query39	0.05	0.04	0.04
query40	0.20	0.17	0.16
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 99.23 s
Total hot run time: 28.59 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 9, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

PR approved by anyone and no changes requested.

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.72% (19441/36876)
Line Coverage 36.20% (180910/499705)
Region Coverage 32.60% (140507/431047)
Branch Coverage 33.61% (60820/180961)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.69% (25902/36132)
Line Coverage 54.28% (270574/498467)
Region Coverage 51.68% (225024/435429)
Branch Coverage 53.16% (96575/181665)

@wm1581066 wm1581066 added the usercase Important user case type label label Feb 9, 2026
@dataroaring dataroaring merged commit 78a7b22 into apache:master Feb 9, 2026
31 of 33 checks passed
github-actions bot pushed a commit that referenced this pull request Feb 9, 2026
…ion` (#60607)

### What problem does this PR solve?

```
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/selectdb-core/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo_t*, void*) [clone .part.0] at src/hotspot/os/posix/signals_posix.cpp:454
 2# JVM_handle_linux_signal at src/hotspot/os/posix/signals_posix.cpp:641
 3# 0x00007F6D2FBC6520 in /lib/x86_64-linux-gnu/libc.so.6
 4# je_tcache_bin_flush_small at /root/selectdb-core/thirdparty/src/jemalloc-5.3.0/doris_build/../src/tcache.c:529
 5# je_free_default at /root/selectdb-core/thirdparty/src/jemalloc-5.3.0/doris_build/../src/jemalloc.c:3014
 6# std::_Hashtable<std::pair<long, long>, std::pair<std::pair<long, long> const, std::unordered_map<long, doris::TabletHotspotMapValue, std::hash<long>, std::equal_to<long>, std::allocator<std::pair<long co
nst, doris::TabletHotspotMapValue> > > >, std::allocator<std::pair<std::pair<long, long> const, std::unordered_map<long, doris::TabletHotspotMapValue, std::hash<long>, std::equal_to<long>, std::allocator<std
::pair<long const, doris::TabletHotspotMapValue> > > > >, std::__detail::_Select1st, std::equal_to<std::pair<long, long> >, doris::MapKeyHash, std::__detail::_Mod_range_hashing, std::__detail::_Default_range
d_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_move_assign(std::_Hashtable<std::pair<long, long>, std::pair<std::pair<long, long> const, std::unordered
_map<long, doris::TabletHotspotMapValue, std::hash<long>, std::equal_to<long>, std::allocator<std::pair<long const, doris::TabletHotspotMapValue> > > >, std::allocator<std::pair<std::pair<long, long> const, 
std::unordered_map<long, doris::TabletHotspotMapValue, std::hash<long>, std::equal_to<long>, std::allocator<std::pair<long const, doris::TabletHotspotMapValue> > > > >, std::__detail::_Select1st, std::equal_
to<std::pair<long, long> >, doris::MapKeyHash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>
 >&&, std::integral_constant<bool, true>) at /root/tools/ldb-16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/hashtable.h:1342
 7# doris::TabletHotspot::get_top_n_hot_partition(std::vector<doris::THotTableMessage, std::allocator<doris::THotTableMessage> >*) in /opt/selectdb/be/lib/doris_be
 8# doris::CloudBackendService::get_top_n_hot_partitions(doris::TGetTopNHotPartitionsResponse&, doris::TGetTopNHotPartitionsRequest const&) at /root/selectdb-core/be/src/cloud/cloud_backend_service.cpp:86
 9# doris::BackendServiceProcessor::process_get_top_n_hot_partitions(int, apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, void*) at /root/selectdb-core/gensrc/build/gen_cpp/BackendService.cpp:8102
10# doris::BackendServiceProcessor::dispatchCall(apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, void*) in /opt/selectdb/be/lib/doris_be
11# apache::thrift::TDispatchProcessor::process(std::shared_ptr<apache::thrift::protocol::TProtocol>, std::shared_ptr<apache::thrift::protocol::TProtocol>, void*) in /opt/selectdb/be/lib/doris_be
12# apache::thrift::server::TConnectedClient::run() in /opt/selectdb/be/lib/doris_be
13# apache::thrift::server::TThreadedServer::TConnectedClientRunner::run() in /opt/selectdb/be/lib/doris_be
14# apache::thrift::concurrency::Thread::threadMain(std::shared_ptr<apache::thrift::concurrency::Thread>) in /opt/selectdb/be/lib/doris_be
15# void std::__invoke_impl<void, void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> >(std::__invoke_other, void (*&&)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread>&&) in /opt/selectdb/be/lib/doris_be
16# std::__invoke_result<void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> >::type std::__invoke<void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> >(void (*&&)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread>&&) in /opt/selectdb/be/lib/doris_be
17# void std::thread::_Invoker<std::tuple<void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> > >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) in /opt/selectdb/be/lib/doris_be
18# std::thread::_Invoker<std::tuple<void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> > >::operator()() in /opt/selectdb/be/lib/doris_be
19# std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> > > >::_M_run() in /opt/sele
ctdb/be/lib/doris_be
20# execute_native_thread_routine at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
21# 0x00007F6D2FC18AC3 in /lib/x86_64-linux-gnu/libc.so.6
22# __clone in /lib/x86_64-linux-gnu/libc.so.6 
```

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants