PermsSyncer: track recent job states in redis#44258
Conversation
|
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
unknwon
left a comment
There was a problem hiding this comment.
The approach does seem to make sense to me, will do a thorough review once marked as ready 👍
|
I have reviewed all the PRs. Will pushing the Provider Status to Redis be a very high load task that we need to do it via a separate routine? Can't it be done at each point where we are recording the provider status in first place? We do push many items to prometheus during the process and pushing things to redis should be quick. I am worried about the many places and functions this touches as it is passed up to so many levels making it a bit cumbersome to refactor later on. |
I may be reading this wrong, but
note that generating prometheus metrics does not push items to prometheus - prometheus will scrape the service for aggregated metrics on a regular time interval, every 30 seconds - and is not a comparable process to making a network request to set a record in Redis (though it is comparable to a DB write as I mentioned above) |
IMO some of these are improvements (e.g. https://github.com/sourcegraph/sourcegraph/pull/44251 reducing number of return values), but happy to discuss the other changes further on their respective PRs if you have any ideas or thoughts on the need for this! The core issue is outlined in https://github.com/sourcegraph/sourcegraph/pull/44257 (we have no way of telling the health of a provider programmatically) |
aa65fb7 to
764cca8
Compare
b08b48f to
13fbda3
Compare
764cca8 to
1e27ce1
Compare
6e3a7db to
512228b
Compare
512228b to
d0e7fca
Compare
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 2bdefd3...0d4cdd2.
|
638f992 to
63c8511
Compare
c96b9b1 to
ba45511
Compare
ba45511 to
3cde488
Compare
18c457a to
c408d9c
Compare
3cde488 to
fa813d6
Compare
|
This feels like a temporary solution to get a quick win, IMHO. Why not build the proper thing? Can't we migrate the sync jobs to be actual database-backed worker jobs? That would be the ideal outcome. And if that's our goal then this seems like we make it harder for ourselves in the future to migrate to the proper solution. |
Unfortunately, I don't have the capacity to drive such an extensive migration yet, but we do need an endpoint available as part of https://github.com/sourcegraph/customer/issues/1534 Additionally, I'm not very convinced a database-backed job framework is a good choice here. Perms sync seems designed to queue and run ephemerally, such that if you just run it enough times it'll eventually get you to the right state, unless there's a major issue, in which case those will be surfaced in the last X jobs. There appears to be no real need to go back in time, nor for the jobs to be resilient because they get queued all the time and the point in time in which they get queued isn't very important as long as they run eventually |
09bc56d to
e6bf445
Compare
|
Codenotify: Notifying subscribers in OWNERS files for diff 2bdefd3...0d4cdd2. No notifications. |
Adds a new permissionsSyncJobs query that returns the most recent outcomes of permissions sync jobs, with per-provider details included (see #44258). This will be used to validate permissions syncing is starting correctly and the desired providers are being involved and in a healthy state. We don't implement pagination since the endpoint is intended as a stream of recent events and retention is low, but the query schema is set up so that it can be extended to support pagination in the future. Co-authored-by: Erik Seliger <erikseliger@me.com>
Disagree. We want to display in the UI: last sync at, next sync at. We want to enqueue a next sync. We want to see the errors of the last sync. We want to have different priorities for syncs. All of that can be done with a database and a database-backed worker. Querying the list of last sync jobs is another argument for that, IMHO. |
|
True, some of those components are in place already - for example we track the outcome of the last sync for a particular entity already - but next sync, priority, state of the queue etc isn't transparent at all. Only discussion I've found is https://github.com/sourcegraph/sourcegraph/issues/16197 - I'll open up an issue for IAM |

Tracks perms sync job outcomes, notably per-provider states (https://github.com/sourcegraph/sourcegraph/pull/44257), in Redis. We set an aggressive TTL of 5 minutes by default because in practice, only the details for the last few sync jobs are relevant, and this prevents us from putting too much pressure on Redis on larger instances.
This can be extended in the future to be database-backed, but for now I think Redis is the right choice - the volume of entries can be quite high when accumulated so we want something easy to expire, and realistically only the details for the last few sync jobs are relevant. Additionally, the records store and reader in package
synclogscan easily be rewritten without too much impact to write to a database in the future.The goal is to eventually allow us to render this data via the GraphQL API similarly to the external service sync status API (https://github.com/sourcegraph/sourcegraph/pull/44387), hence the placement of the struct definitions in a shared package
Part of https://github.com/sourcegraph/customer/issues/1534
Stacked on https://github.com/sourcegraph/sourcegraph/pull/44257
Test plan
Add a GitHub code host with
"authorization": {}, and add a private repo to the listWait a few minutes