Implement partition compaction grouper#6172

alexqyle · 2024-08-20T00:46:37Z

What this PR does:

This PR implements partition compaction grouper.

Introduced new files for partition compaction:

partitioned_group_info: This file acts like a compaction plan. It contains the information that how source blocks from compaction time range being assigned to partitions for compaction. partitionedGroupID in the file is unique for particular time range.
partition_visit_marker: Visit marker file for each partition under compaction. This could prevent multiple compactors from working on the same partition compaction. Similar to block visit marker.

Here is high level algorithm of partition compaction grouper:

Group blocks by time range
Load existing partitioned_group_info files
Gathering information of each time range and check which time range where grouper can take compaction job from
Create partitioned groups from grouped blocks
Sanitize partitions from each partitioned group
Return ready to compact partitioned groups to Thanos for compaction

Introduced meta_extensions to save partition information of result block in meta.json. This infomation can be used to better assign block to proper partition in the next round of compaction.

Which issue(s) this PR fixes:
NA

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Alex Le <leqiyue@amazon.com>

docs/configuration/config-file-reference.md

Signed-off-by: Alex Le <leqiyue@amazon.com>

pkg/compactor/compactor.go

docs/configuration/config-file-reference.md

Signed-off-by: Alex Le <leqiyue@amazon.com>

pkg/compactor/partition_compaction_grouper.go

pkg/util/validation/limits.go

danielblando · 2024-12-20T01:36:48Z

Overall looking good to me. Just a few comments.

For user to migrate. Would it just work changing the configuration and deploying? I would imagine yes as we would not find partition data to any block and treat them all as partitionId 0. Correct?

alexqyle · 2024-12-20T01:44:50Z

Overall looking good to me. Just a few comments.

For customer to migrate. Would it just work changing the configuration and deploying? I would imagine yes as we would not find partition data to any block and treat them all as partitionId 0. Correct?

Yes. Switching back and forth between partitioning and non partitioning should not cause any issue. At most, the largest time range block would be recompacted one more time.

danielblando · 2024-12-20T01:53:44Z

How it works while deployment is happening? Because we can have compactors creating blocks with partition and compactors creating others without and they are seeing different visit markers? Would it create duplicate compaction while deployment is happening?

alexqyle · 2024-12-20T04:56:13Z

How it works while deployment is happening? Because we can have compactors creating blocks with partition and compactors creating others without and they are seeing different visit markers? Would it create duplicate compaction while deployment is happening?

If both are compacting the largest time range blocks, it would create duplicate blocks. For any lower level blocks, it would be compacted into higher level properly after deployment.

Signed-off-by: Alex Le <leqiyue@amazon.com>

pkg/compactor/partitioned_group_info.go

Signed-off-by: Alex Le <leqiyue@amazon.com>

danielblando

Thanks for this work

yeya24

Thanks. Mostly are nits.
We should also update https://cortexmetrics.io/docs/configuration/v1guarantees/ to mention it is experimental feature but we can do it after you finish all partition compactor PRs

pkg/compactor/compactor_metrics.go

pkg/util/validation/limits.go

yeya24 · 2024-12-31T17:55:11Z

pkg/compactor/partitioned_group_info.go

+}
+
+func UpdatePartitionedGroupInfo(ctx context.Context, bkt objstore.InstrumentedBucket, logger log.Logger, partitionedGroupInfo PartitionedGroupInfo) (*PartitionedGroupInfo, error) {
+	existingPartitionedGroup, _ := ReadPartitionedGroupInfo(ctx, bkt, logger, partitionedGroupInfo.PartitionedGroupID)


Is it fine to ignore the error here?

Ignore error in order to always update partitioned group info. There is no harm to put latest version of partitioned group info which is supposed to be the correct grouping based on latest bucket store. We skip updating when the file exist just want to try best finishing previously generated plan. But even the previous partitioned group info got updated in the middle, the new file should consider already compacted partitions into account.

Make sense. Can you add comment for the reason

pkg/compactor/compactor_metrics.go

pkg/compactor/partition_compaction_grouper.go

pkg/compactor/partition_visit_marker.go

Signed-off-by: Alex Le <leqiyue@amazon.com>

pkg/compactor/partition_compaction_grouper.go

Signed-off-by: Alex Le <leqiyue@amazon.com>

yeya24

Thanks. I think we need some document about how this works but can be done after everything is implemented

Signed-off-by: Alex Le <leqiyue@amazon.com>

Implement partition compaction grouper

18d8cbc

Signed-off-by: Alex Le <leqiyue@amazon.com>

pull-request-size bot added the size/XXL label Aug 20, 2024

yeya24 reviewed Aug 20, 2024

View reviewed changes

docs/configuration/config-file-reference.md Outdated Show resolved Hide resolved

alexqyle added 3 commits August 19, 2024 17:55

fix comment

04b50a3

Signed-off-by: Alex Le <leqiyue@amazon.com>

replace level 1 compaction limits with ingestion replication factor

e408173

Signed-off-by: Alex Le <leqiyue@amazon.com>

fix doc

eb09a54

Signed-off-by: Alex Le <leqiyue@amazon.com>

danielblando reviewed Sep 30, 2024

View reviewed changes

pkg/compactor/compactor.go Show resolved Hide resolved

danielblando reviewed Sep 30, 2024

View reviewed changes

docs/configuration/config-file-reference.md Outdated Show resolved Hide resolved

alexqyle added 2 commits October 2, 2024 16:23

update compaction_visit_marker_timeout default value

8f34239

Signed-off-by: Alex Le <leqiyue@amazon.com>

Merge branch 'master' into partition-compaction-grouper

567cabe

Signed-off-by: Alex Le <leqiyue@amazon.com>

danielblando reviewed Dec 20, 2024

View reviewed changes

pkg/compactor/partition_compaction_grouper.go Show resolved Hide resolved

pkg/util/validation/limits.go Outdated Show resolved Hide resolved

update default value for compactor_partition_index_size_limit_in_bytes

baf2969

Signed-off-by: Alex Le <leqiyue@amazon.com>

danielblando reviewed Dec 27, 2024

View reviewed changes

pkg/compactor/partitioned_group_info.go Outdated Show resolved Hide resolved

refactor code

4318a48

Signed-off-by: Alex Le <leqiyue@amazon.com>

danielblando approved these changes Dec 27, 2024

View reviewed changes

yeya24 reviewed Dec 31, 2024

View reviewed changes

alexqyle added 2 commits December 31, 2024 11:47

address comments and refactor

f6bccdb

Signed-off-by: Alex Le <leqiyue@amazon.com>

address comment

61e0dc0

Signed-off-by: Alex Le <leqiyue@amazon.com>

yeya24 reviewed Dec 31, 2024

View reviewed changes

pkg/compactor/partition_compaction_grouper.go Outdated Show resolved Hide resolved

pkg/compactor/partition_compaction_grouper.go Outdated Show resolved Hide resolved

address comment

bc36e56

Signed-off-by: Alex Le <leqiyue@amazon.com>

yeya24 approved these changes Dec 31, 2024

View reviewed changes

update config name

252c2a2

Signed-off-by: Alex Le <leqiyue@amazon.com>

yeya24 merged commit 4953086 into cortexproject:master Dec 31, 2024

Conversation

alexqyle commented Aug 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danielblando commented Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexqyle commented Dec 20, 2024

Uh oh!

danielblando commented Dec 20, 2024

Uh oh!

alexqyle commented Dec 20, 2024

Uh oh!

Uh oh!

danielblando left a comment

Choose a reason for hiding this comment

Uh oh!

yeya24 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yeya24 Dec 31, 2024

Choose a reason for hiding this comment

Uh oh!

alexqyle Dec 31, 2024

Choose a reason for hiding this comment

Uh oh!

yeya24 Dec 31, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yeya24 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexqyle commented Aug 20, 2024 •

edited

Loading

danielblando commented Dec 20, 2024 •

edited

Loading