[fix](cloud) Fix SingleFlight deadlock in CloudTabletMgr::get_tablet during warmup#60622
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
Fixes a SingleFlight self-deadlock that can occur during cloud tablet warmup / initial tablet loading by preventing RowsetMeta::fs() from re-entering ExecEnv::get_tablet() for the same tablet.
Changes:
- Add a
RowsetMetaAPI to pre-initialize the cached encryption algorithm used byRowsetMeta::fs(). - Pre-set the encryption algorithm in
CloudTablet::_submit_segment_download_task()before callingrowset_meta->fs().
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| be/src/olap/rowset/rowset_meta.h | Adds an API to pre-initialize the encryption algorithm cached via DorisCallOnce. |
| be/src/cloud/cloud_tablet.cpp | Uses the new API before RowsetMeta::fs() in the segment warmup download path to avoid re-entrant tablet lookup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
be/src/cloud/cloud_tablet.cpp
Outdated
| // Pre-set encryption algorithm to avoid re-entrant get_tablet() call inside | ||
| // RowsetMeta::fs() which causes SingleFlight deadlock when the tablet is not | ||
| // yet cached (during initial load_tablet). | ||
| rowset_meta->set_encryption_algorithm(_tablet_meta->encryption_algorithm()); | ||
| auto file_system = rowset_meta->fs(); |
There was a problem hiding this comment.
This pre-set is only done in _submit_segment_download_task(), but _add_rowsets_directly() can also submit inverted-index warmup tasks. When config::file_cache_enable_only_warm_up_idx is enabled (segments skipped), _submit_inverted_index_download_task() will still call rowset_meta->fs() without the pre-set and can re-enter ExecEnv::get_tablet() during initial load, so the SingleFlight deadlock can still occur. Apply the same pre-set for the inverted-index download path too (or centralize it before any warmup task calls RowsetMeta::fs()).
b8ce1b3 to
38d4442
Compare
|
run buildall |
…during warmup During initial tablet loading, the load_tablet lambda inside SingleFlight calls sync_tablet_rowsets -> _add_rowsets_directly -> _submit_segment_download_task -> rowset_meta->fs(), which internally calls ExecEnv::get_tablet() for the same tablet_id. Since the tablet hasn't been inserted into cache yet, this triggers a second SingleFlight lookup for the same key, causing event.wait() to deadlock (waiting for itself). Fix by pre-setting the encryption algorithm on RowsetMeta before calling fs() in _submit_segment_download_task, so _determine_encryption_once returns the cached value without re-entering get_tablet.
38d4442 to
00bf565
Compare
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 30325 ms |
ClickBench: Total hot run time: 28.29 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
During initial tablet loading, the load_tablet lambda inside SingleFlight calls sync_tablet_rowsets -> _add_rowsets_directly -> _submit_segment_download_task -> rowset_meta->fs(), which internally calls ExecEnv::get_tablet() for the same tablet_id. Since the tablet hasn't been inserted into cache yet, this triggers a second SingleFlight lookup for the same key, causing event.wait() to deadlock (waiting for itself).
Fix by pre-setting the encryption algorithm on RowsetMeta before calling fs() in _submit_segment_download_task, so _determine_encryption_once returns the cached value without re-entering get_tablet.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)