Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

gitserver: SRC_ENABLE_GC_AUTO to enable & disable gc on fetch#47852

Merged
abeatrix merged 8 commits intomainfrom
bee-gc
Feb 21, 2023
Merged

gitserver: SRC_ENABLE_GC_AUTO to enable & disable gc on fetch#47852
abeatrix merged 8 commits intomainfrom
bee-gc

Conversation

@abeatrix
Copy link
Contributor

@abeatrix abeatrix commented Feb 17, 2023

Close https://github.com/sourcegraph/customer/issues/1955

Context: A customer is encountering an issue with one of their fast-moving repo that caused the repo unable to sync & index, making it unsearchable (context). We suspected the addition of --no-auto-gc in https://github.com/sourcegraph/sourcegraph/pull/36274 is the cause as the issue started after performing MVU, and the error went away when running git gc manually inside gitserver.

Included in this PR:

  • set SRC_ENABLE_GC_AUTO should enable and disable gc on fetch

Test plan

Ran locally to confirm setting SRC_ENABLE_GC_AUTO would toggle the command flag:

[    gitserver-1] 2023/02/17 20:54:11 Setting --auto-gc to fetch
[    gitserver-1] 2023/02/17 20:57:08 Setting --no-auto-gc to fetch

@sourcegraph-bot
Copy link
Contributor

sourcegraph-bot commented Feb 17, 2023

Codenotify: Notifying subscribers in CODENOTIFY files for diff a071a07...d41d313.

Notify File(s)
@indradhanush cmd/gitserver/server/cleanup.go
cmd/gitserver/server/refspecoverrides.go
cmd/gitserver/server/serverutil_test.go
cmd/gitserver/server/vcs_syncer_git.go
@sashaostrikov cmd/gitserver/server/cleanup.go
cmd/gitserver/server/refspecoverrides.go
cmd/gitserver/server/serverutil_test.go
cmd/gitserver/server/vcs_syncer_git.go

return exec.CommandContext(ctx, "git", append([]string{"fetch", "--no-auto-gc", "--progress", "--prune", remoteURL.String()}, refspecOverrides...)...)
// Perform automatic repository maintenance at the end of fetch
gc := "--auto-gc"
if e := os.Getenv("SRC_ENABLE_GC_AUTO"); e == "false" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe strconv.ParseBool would be better here? Especially because customers will set this, there might be a lot of variation in how they set it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to check the env var here again? Can't we check whether gitGCMode == gitGCModeGitAutoGC?

We already parse the env var here and already have some logic to set the GC mode here: https://github.com/sourcegraph/sourcegraph/blob/37458d2484d78de53765dfaea929be7c22e611cf/cmd/gitserver/server/cleanup.go#L56-L87

I think we should reuse that instead of adding another place where things can change.

} else {
// Perform automatic repository maintenance at the end of fetch
gc := "--auto-gc"
if e := os.Getenv("SRC_ENABLE_GC_AUTO"); e == "false" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strconv.ParseBool?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: let's reuse gitGCMode or at least the parsing of the env var or move this check over to the the others and use .envGet.

Copy link
Contributor

@indradhanush indradhanush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but could you please add a section to the end of this doc to document this new behaviour?

Can be a follow up PR as well. Leaving the comment so that it's tracked.

Copy link
Contributor

@mrnugget mrnugget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for hopping on this! Requesting changes because I think we should reuse the GC mode variable OR move the parsing of the env var closer to the others.

return exec.CommandContext(ctx, "git", append([]string{"fetch", "--no-auto-gc", "--progress", "--prune", remoteURL.String()}, refspecOverrides...)...)
// Perform automatic repository maintenance at the end of fetch
gc := "--auto-gc"
if e := os.Getenv("SRC_ENABLE_GC_AUTO"); e == "false" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to check the env var here again? Can't we check whether gitGCMode == gitGCModeGitAutoGC?

We already parse the env var here and already have some logic to set the GC mode here: https://github.com/sourcegraph/sourcegraph/blob/37458d2484d78de53765dfaea929be7c22e611cf/cmd/gitserver/server/cleanup.go#L56-L87

I think we should reuse that instead of adding another place where things can change.

} else {
// Perform automatic repository maintenance at the end of fetch
gc := "--auto-gc"
if e := os.Getenv("SRC_ENABLE_GC_AUTO"); e == "false" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: let's reuse gitGCMode or at least the parsing of the env var or move this check over to the the others and use .envGet.

cmd = refspecOverridesFetchCmd(ctx, remoteURL)
} else {
// Perform automatic repository maintenance at the end of fetch
gc := "--auto-gc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need to set this explicitly? I kinda assumed that, depending on the GC mode we use, we set gc.auto = 1 or 0 in the git config. See: https://github.com/sourcegraph/sourcegraph/blob/37458d2484d78de53765dfaea929be7c22e611cf/cmd/gitserver/server/cleanup.go#L1402-L1420

@abeatrix
Copy link
Contributor Author

Thank you all for the reviews! I wasn't aware of gitGCMode but what @mrnugget suggested makes sense to me. I do have a follow-up question to make sure I understand the current implementations correctly. It looks like we disable gc for gitGCModeJanitorAutoGC & gitGCModeMaintenance even when enableGCAuto is set to true, can someone explain to me if this is intended (or is this the issue that we are trying to fix)?

@mrnugget
Copy link
Contributor

It looks like we disable gc for gitGCModeJanitorAutoGC & gitGCModeMaintenance even when enableGCAuto is set to true, can someone explain to me if this is intended (or is this the issue that we are trying to fix)?

I'll let @stefanhengl answer here (since he added it in https://github.com/sourcegraph/sourcegraph/commit/f461a4ae253faa2752d2bc64d049c90569de7a1f) but I think the thinking is this: if we use gitGCModeJanitorAutoGC or gitGCModeMaintenance we run our own garbage collection in janitor runs, every 1min, so we don't need to run auto-gc on commands. that's why we disable it, so a git fetch that triggers auto-gc can't clash with a janitor run in the background.

But: aren't we now discovering that even with janitor-git-gc-runs we still run into these problems and we should probably also use auto-gc even if gitGCModeJanitorAutoGC?

@abeatrix
Copy link
Contributor Author

abeatrix commented Feb 20, 2023

But: aren't we now discovering that even with janitor-git-gc-runs we still run into these problems and we should probably also use auto-gc even if gitGCModeJanitorAutoGC?

@mrnugget Yea that's what I believe is the issue as well and have updated to enable gc here for gitGCModeJanitorAutoGC, but want to make sure why it wasn't in the first place but what you explained clarified it for me :) Will wait for @stefanhengl to confirm

edit: @stefanhengl should we also remove the hard-coded "gc.auto=1" here?

@stefanhengl
Copy link
Member

edit: @stefanhengl should we also remove the hard-coded "gc.auto=1" here?

From the docs

gc.auto::
When there are approximately more than this many loose
objects in the repository, git gc --auto will pack them.
Some Porcelain commands use this command to perform a
light-weight garbage collection from time to time. The
default value is 6700.

This is an aggressive setting for packing but AFAIK we never had trouble with it. Changing it might blow up dir sizes on gitserver. I never changed it and unless we have specific reason to remove it I wouldn't touch it.

@stefanhengl
Copy link
Member

Thank you all for the reviews! I wasn't aware of gitGCMode but what @mrnugget suggested makes sense to me. I do have a follow-up question to make sure I understand the current implementations correctly. It looks like we disable gc for gitGCModeJanitorAutoGC & gitGCModeMaintenance even when enableGCAuto is set to true, can someone explain to me if this is intended (or is this the issue that we are trying to fix)?

Once it became obvious that sg maintenance caused repo corruption for customers with large repos I tried to lock down any unintentional garbage collection run by git porcelain commands IE I wanted to make sure our janitor job is the only one running gc. This whole approach failed and I recommend to take the original implementation (before I added sg maintenance) as baseline. The original implementation didn't set any repo config options like we do in gitSetAutoGC.

@abeatrix abeatrix requested a review from mrnugget February 21, 2023 14:28
@abeatrix abeatrix merged commit 382b481 into main Feb 21, 2023
@abeatrix abeatrix deleted the bee-gc branch February 21, 2023 19:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants