Implement partition compaction grouper#6172
Conversation
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
|
Overall looking good to me. Just a few comments. For user to migrate. Would it just work changing the configuration and deploying? I would imagine yes as we would not find partition data to any block and treat them all as partitionId 0. Correct? |
Yes. Switching back and forth between partitioning and non partitioning should not cause any issue. At most, the largest time range block would be recompacted one more time. |
|
How it works while deployment is happening? Because we can have compactors creating blocks with partition and compactors creating others without and they are seeing different visit markers? Would it create duplicate compaction while deployment is happening? |
If both are compacting the largest time range blocks, it would create duplicate blocks. For any lower level blocks, it would be compacted into higher level properly after deployment. |
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
danielblando
left a comment
There was a problem hiding this comment.
Thanks for this work
yeya24
left a comment
There was a problem hiding this comment.
Thanks. Mostly are nits.
We should also update https://cortexmetrics.io/docs/configuration/v1guarantees/ to mention it is experimental feature but we can do it after you finish all partition compactor PRs
| } | ||
|
|
||
| func UpdatePartitionedGroupInfo(ctx context.Context, bkt objstore.InstrumentedBucket, logger log.Logger, partitionedGroupInfo PartitionedGroupInfo) (*PartitionedGroupInfo, error) { | ||
| existingPartitionedGroup, _ := ReadPartitionedGroupInfo(ctx, bkt, logger, partitionedGroupInfo.PartitionedGroupID) |
There was a problem hiding this comment.
Is it fine to ignore the error here?
There was a problem hiding this comment.
Ignore error in order to always update partitioned group info. There is no harm to put latest version of partitioned group info which is supposed to be the correct grouping based on latest bucket store. We skip updating when the file exist just want to try best finishing previously generated plan. But even the previous partitioned group info got updated in the middle, the new file should consider already compacted partitions into account.
There was a problem hiding this comment.
Make sense. Can you add comment for the reason
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
yeya24
left a comment
There was a problem hiding this comment.
Thanks. I think we need some document about how this works but can be done after everything is implemented
Signed-off-by: Alex Le <leqiyue@amazon.com>
What this PR does:
This PR implements partition compaction grouper.
Introduced new files for partition compaction:
partitionedGroupIDin the file is unique for particular time range.Here is high level algorithm of partition compaction grouper:
Introduced
meta_extensionsto save partition information of result block in meta.json. This infomation can be used to better assign block to proper partition in the next round of compaction.Which issue(s) this PR fixes:
NA
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]