-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Describe the bug
BE may probabilistic trigger segmentfault when BE exit
This bug will not affect the function, but it may increase the difficulty of subsequent troubleshooting such as heap-profile
here is a coredump (master build with debug)
Core was generated by `/home/users/stdpain/opt/doris-deploy/be/lib/palo_be'.
Program terminated with signal SIGSEGV, Segmentation fault.
b#0 0x00007ff97bb7d09c in __gnu_cxx::__normal_iterator<doris::TabletManager::tablets_shard*, std::vector<doris::TabletManager::tablets_shard, std::allocator<doris::TabletManager::tablets_shard> > >::__normal_iterator (this=0x7ff904d442b8, __i=<error reading variable>)
at /ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/stl_iterator.h:780
780 : _M_current(__i) { }
[Current thread is 1 (LWP 38702)]
warning: File "/ssd1/opt/fenghaoasuch/workspace/doris/workspace/doris-toolchain/gcc730/lib64/libstdc++.so.6.0.24-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
(gdb) bt
#0 0x00007ff97bb7d09c in __gnu_cxx::__normal_iterator<doris::TabletManager::tablets_shard*, std::vector<doris::TabletManager::tablets_shard, std::allocator<doris::TabletManager::tablets_shard> > >::__normal_iterator (this=0x7ff904d442b8, __i=<error reading variable>)
at /ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/stl_iterator.h:780
#1 0x00007ff97bb7b3df in std::vector<doris::TabletManager::tablets_shard, std::allocator<doris::TabletManager::tablets_shard> >::begin (this=0x8)
at /ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/stl_vector.h:564
#2 0x00007ff97bb6ed37 in doris::TabletManager::find_best_tablet_to_compaction (this=0x0,
compaction_type=doris::CUMULATIVE_COMPACTION, data_dir=0x55fca00,
tablet_submitted_compaction=std::vector of length 0, capacity 0)
at /home/users/stdpain/doris/core/be/src/olap/tablet_manager.cpp:681
#3 0x00007ff97ba76a83 in doris::StorageEngine::_compaction_tasks_generator (this=0x558cc00,
compaction_type=doris::CUMULATIVE_COMPACTION,
data_dirs=std::vector of length 1, capacity 1 = {...})
at /home/users/stdpain/doris/core/be/src/olap/olap_server.cpp:397
#4 0x00007ff97ba764d5 in doris::StorageEngine::_compaction_tasks_producer_callback (this=0x558cc00)
at /home/users/stdpain/doris/core/be/src/olap/olap_server.cpp:337
#5 0x00007ff97ba73d39 in doris::StorageEngine::<lambda()>::operator()(void) const (
__closure=0x6fb8f18) at /home/users/stdpain/doris/core/be/src/olap/olap_server.cpp:78
#6 0x00007ff97ba77ae1 in std::_Function_handler<void(), doris::StorageEngine::start_bg_threads()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
at /ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/std_function.h:316
#7 0x00007ff97cf14b7c in std::function<void ()>::operator()() const (this=0x6fb8f18)
at /ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/std_function.h:706
#8 0x00007ff97a7163ce in doris::Thread::supervise_thread (arg=0x6fb8f00)
at /home/users/stdpain/doris/core/be/src/util/thread.cpp:386
#9 0x00007ff978cd21c3 in start_thread () from /opt/compiler/gcc-4.8.2/lib64/libpthread.so.0
#10 0x00007ff9782f512d in clone () from /opt/compiler/gcc-4.8.2/lib64/libc.so.6
Here was be.out when rebuild with ASAN:
=================================================================
==54102==ERROR: AddressSanitizer: heap-use-after-free on address 0x6190000cddc8 at pc 0x000001d36929 bp 0x7fcbbb572b70 sp 0x7fcbbb572b68
READ of size 8 at 0x6190000cddc8 thread T233 (compaction_task)
#0 0x1d36928 in std::_Rb_tree<doris::DataDir*, std::pair<doris::DataDir* const, std::vector<long, std::allocator<long> > >, std::_Select1st<std::pair<doris::DataDir* const, std::vector<long, std::allocator<long> > > >, std::less<doris::DataDir*>, std::allocator<std::pair<doris::DataDir* const, std::vector<long, std::allocator<long> > > > >::_M_begin() /ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/stl_tree.h:737
...
To Reproduce
It's hard to reproduce the bug... but I found a way to stabilize the recurrence problem ....
we could modify be/service/doris_main.cpp:
heartbeat_thrift_server = nullptr;
sleep(20); // modify here
doris::ExecEnv::destroy(exec_env);
return 0;
- exec ./bin/start_be.sh
- kill be
It seems that when StorageEngine is deleted , but the bachground thread is still runting, when background thread try to access StorageEngine ... BE will crash
Expected behavior
BE shouldn't exit with segmentfault,
Desktop (please complete the following information):
- OS: CentOS 6
** Some Solution **
make StorageEngine extends shared_from_this
or
wait backgroud exit before StorageEngine destroyed