[BUG] Fix Colocate table balance bug#4936
Merged
morningman merged 1 commit intoapache:masterfrom Nov 22, 2020
Merged
Conversation
This was referenced Nov 24, 2020
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix bug #4935
当前策略:
每个group中维护一个bucketId所在的be列表:backendSeq
线程每隔20s:
检测backendSeq中是否有be不可用,如果有,则选择可用的be将其在backendSeq中替换
检测group中的tablet是否与backendSeq相匹配,如果不匹配,将group设置为unstable,并且执行迁移任务
对处于stable状态的group进行均衡:根据backendSeq计算所有be中bucketId的数目,从bucketId占有高的be迁移到bucketId占有低的be。此处只更新backendSeq,实际执行迁移任务在第2步。
存在的问题:
如果在相同的时间down掉比较多的be,在第1步中,会将这些be从backendSeq中移除,并且第2步检测到backendSeq不匹配,将group标记为unstable,但是如果现有的be磁盘不能容纳down掉的be上的所有tablet,此时group会一直处于unstable状态,即使再加入新的be,也不能触发第3步,因为第3步只会在group是stable状态下才能执行。
策略更改:
将现有策略的1和3融合成一个过程:
首先检测backendSeq中是否存在不可用的be,均衡时,优先迁移不可用be的bucketId到buckedId占有低的be,其次再从bucketId占有高的be迁移到bucketId占有低的be。
同现有策略