…ache#55630)
Currently, `MetaServiceImpl::get_rowset` use `calc_sync_versions` to
eliminate unnecessary version ranges when BE sync rowset metas. One of
the optimizations is as the following:
```cpp
std::vector<std::pair<int64_t, int64_t>> calc_sync_versions(int64_t req_bc_cnt, int64_t bc_cnt,
int64_t req_cc_cnt, int64_t cc_cnt,
int64_t req_cp, int64_t cp,
int64_t req_start, int64_t req_end) {
// ...
if (req_cc_cnt < cc_cnt) {
Version cc_version;
if (req_cp < cp && req_cc_cnt + 1 == cc_cnt) {
// * only one CC happened and CP changed
// BE [=][=][=][=][=====][=][=]
// ^~~~~ req_cp
// MS [=][=][=][=][xxxxxxxxxxxxxx][=======][=][=]
// ^~~~~~~ ms_cp
// ^____________^ related_versions: [req_cp, ms_cp - 1]
//
cc_version = {req_cp, cp - 1};
} else {
// ...
}
```
This optimization replies on the assumption that only cumulative
compaction will change the cumulative point. However, full compaction
can also change the cumulative point, which breaks the above replied
assumption. This will cause data correctness problem in multi-cluster
environment because it will make the tablet failed to sync some rowset
metas forever.
A data correctness problem has been observed in the following
situaitions:
1. For a certain tablet, base_compaction_cnt=14,
cumulative_compaction_cnt=804, cumu_point=7458.
On node A of the write cluster (cluster 0), a full compaction of
[2-7464] and a cumulative compaction of [7465-7486] were performed. The
stats then became base_compaction_cnt=15, cumulative_compaction_cnt=805,
cumu_point=7465.
2. On node B of the read cluster (cluster 1), during sync_rowset, we
have:
req_base_compaction_cnt=14, base_compaction_cnt=15,
req_cumulative_compaction_cnt=804, cumulative_compaction_cnt=805,
req_cp=7458, cp=7465,
req_start=7487, req_end=int_max.
3. calc_sync_version computes that the rowsets to be pulled are [0-7464]
and [7487-int_max], but it misses the rowset [7465-7486] produced by
cumulative compaction.
4. Moreover, since the max_version of the tablet on cluster 1 node B has
been updated, subsequent sync_rowset operations will also not pull the
rowset [7465-7486].
5. This causes duplicate keys problem on MOW table because new rowset
will generate delete bitmap marks on [7465-7486].
---
This PR forbids the above optimization when full compaction cnt is
changed.
None
- Test <!-- At least one of them must be included. -->
- [x] Regression test
- [x] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
pick #55630