fix: move licensing checks to worker process#54854
Conversation
Since worker is a singleton, this way we make sure that only one license check goroutine is running.
| }) | ||
| goroutine.Go(func() { | ||
| productsubscription.StartCheckForAnomalousLicenseUsage(logger, db) | ||
| }) |
There was a problem hiding this comment.
The producsubscription package is enterprise/internal, did not want to get deep into that can of worms.
There was a problem hiding this comment.
Tbh its probably fine to move that to internal/, we're moving loads of stuff already so you could do a precursor PR to move it before this one
There was a problem hiding this comment.
I think there are more licensing changes upcoming, which would be a good opportunity to deal with the enterprise code split. I don't think spending additional effort here right now is worth it.
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff d37d8d9...faf06a3.
|
Worker isnt a singleton, its designed to be able to have multiple running if you want to scale out other jobs. Cant remember how jobs are chosen though, @efritz do you remember how its configured and what the default is e.g. run all jobs by default |
|
See |
|
Why are we moving this to worker btw? (i could search slack but itd be better for prosperity to have it mentioned here 😄) |
Because it ran in a goroutine in frontend in |
The main reason was, that today if there are multiple instances of frontend, we fire concurrent requests to license check API. There is some logic on the dotcom instance that relies on these requests to come at roughly 12 hour intervals. Receiving multiple requests causes this logic to break. There is other logic on the instance itself that relies on these calls to be done in 12 hour intervals. All of this can lead to concurrent DB/Redis writes, so there is a potential for race conditions, although I did not yet come up with a scenario where the concurrency itself would break the business logic. All in all, doing the goroutine logic in a singleton would avoid the concurrency problem and seems safer. We can achieve it with a DB backed worker, but I wanted to try some quicker approaches first, because refactoring to DB backed worker would take more time and effort. |
|
As far as I know, customers don't just duplicate Just look through these: https://github.com/sourcegraph/sourcegraph/blob/d58eccf2cdfd792c979d2d57130535316669aff3/enterprise/cmd/worker/shared/shared.go#L34-L77 I think it's fine if you add a doc entry for your worker to https://docs.sourcegraph.com/admin/workers with the note that this should not be scaled out. |
|
Agree with @mrnugget, worker can either be scaled vertically (non-issue anyways), or horizontally, by splitting out jobs across multiple machines. That does by no means mean that three workers with the same set of jobs should exist, it would be misconfiguration. |
|
@kopancek what's the status here? We kinda forgot this in sprint planning |
the worker uses database.DB, but it is forbidden to import that from cmd/ directories. Because the worker lived in the licensing package before, licensing package started being dependent on db, which it previously was not. This caused dev/check/go-dbconn-import.sh to fail Moving the files to a different package within worker space has helped without affecting the functionality.
|
The backport to To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-5.1 5.1
# Navigate to the new working tree
cd .worktrees/backport-5.1
# Create a new branch
git switch --create backport-54854-to-5.1
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 6732a5fe357e77fd52854e8b44738c6557243f4b
# Push it to GitHub
git push --set-upstream origin backport-54854-to-5.1
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-5.1Then, create a pull request where the |
Moving licensing checks to a singleton worker job. This way we make sure that only one license check API call is being made on each 12-hour tick. Previously this lived in a frontend service. If multiple frontend services were running, this would fire multiple requests, which is undesirable from many points of view - waste of resources, waste of bandwidth, potential concurrency issues. Having it fire multiple times would also break an assumption that we have in the license check handler on dotcom. For checking if same site_id is used on multiple instances, we assume that the time difference between license check calls would be roughly 10-12 hours. Calling twice in the same second breaks this assumption. Tested locally that the worker starts (cherry picked from commit 6732a5f)
Update changelog with #54854 ## Test plan N/A, just changelog update
Backport #54854 into 5.1 ## Test plan backport PR, N/A
Moving licensing checks to a singleton worker job. This way we make sure that only one license check API call is being made on each 12-hour tick. Previously this lived in a frontend service. If multiple frontend services were running, this would fire multiple requests, which is undesirable from many points of view - waste of resources, waste of bandwidth, potential concurrency issues. Having it fire multiple times would also break an assumption that we have in the license check handler on dotcom. For checking if same site_id is used on multiple instances, we assume that the time difference between license check calls would be roughly 10-12 hours. Calling twice in the same second breaks this assumption. ## Test plan Tested locally that the worker starts
Update changelog with #54854 ## Test plan N/A, just changelog update
Update changelog with #54854 ## Test plan N/A, just changelog update
Moving licensing checks to a singleton worker job. This way we make sure that only one license check API call is being made on each 12-hour tick.
Previously this lived in a frontend service. If multiple frontend services were running, this would fire multiple requests, which is undesirable from many points of view - waste of resources, waste of bandwidth, potential concurrency issues.
Having it fire multiple times would also break an assumption that we have in the license check handler on dotcom. For checking if same site_id is used on multiple instances, we assume that the time difference between license check calls would be roughly 10-12 hours. Calling twice in the same second breaks this assumption.
Test plan
Tested locally that the worker starts