Fix Github public repos to properly tag archived repos#41461
Fix Github public repos to properly tag archived repos#41461varsanojidan merged 11 commits intomainfrom
Conversation
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 49e3285...2535f9f.
|
sashaostrikov
left a comment
There was a problem hiding this comment.
LGTM, approved in advance, just have a question about s.excludeArchived case
|
cc: @keegancsmith @mrnugget tagging both of you in to see if this bug fix is good to go in for 4.0 |
mrnugget
left a comment
There was a problem hiding this comment.
I'm pretty sure we need to handle the goroutine being blocked and leaking, as pointed out by @ryanslade: https://github.com/sourcegraph/sourcegraph/pull/41461/files#r965618434
As for getting this into 4.0: are you confident in this change? does the additional request here add significant load to GitHub hosts? did you test this manually and trigger the code paths here manually?
Im open to thoughts/suggestions, lmk what you think. |
|
So in case the customer doesn't have any archived public repositories, we'd only have to do a single search request more when syncing the external service, right? Sounds like you're confident in this, so: sounds good to me. |
Yep exactly, for example, in the case of our dogfood instance, since we only had 2 public-archived repos, it was only a single extra API call. Thanks for all the feedback! |
keegancsmith
left a comment
There was a problem hiding this comment.
cc: @keegancsmith @mrnugget tagging both of you in to see if this bug fix is good to go in for 4.0
How long has this been a bug? If I am not mistaken this is a very old issue, which makes me unsure of the safety of cherry-picking this in. GitHub API acts differently on github.com vs enterprise vs enterprise versions. I worry that changing something core (like listing all GitHub repos on an enterprise instance) is risky.
To further spell out the risk in practical terms, all instances which only ever synced all Github repos from an instance will now hit the GitHub search API for the first time. This should be fine, but is a risk. Normally we aren't as risk averse, except this is a major version increase.
The main benefit of this change is those instances now will correctly have the archived filter working. This has not been working for a long time though for them?
Now that I have been a grumpy release engineer, let me say I've reviewed the code and I feel comfortable with this change. Just wanted to be explicit about risks and how release engineering would think about this. The funny part is if you had just merged this in before the 4.0 branch cut we wouldn't of had any of these discussions :P
|
Yeah this was my concern as well, though the scope is a bit smaller, the change to hitting the search API would just be for if the "public" keyword is specified in repositoryQuery.
Yeah, it's been quite a pain in the butt for a few customers for a long time now unfortunately 😅. I think its probably fine to also go in the release after 4.0, I'll double check with PM. |
Fixes https://github.com/sourcegraph/customer/issues/1285
The gist of this change is that the Github API for listing public repos doesn't include an archived field (see this issue), so we have no way of knowing which repo is archived.
I had made a change to this previously where if
{"exclude":[{"archived": true}]}was set, that it would use the Github search API instead which could tell us all of the public non-archived repos.My oversight here was that its still important to know whether a repo is archived or not even if we aren't just outright excluding them (see linked issue up top), therefore I have changed how list all public repos works for the Github code host.
First it will use the search API to query for all of the public repos that are archived. It will use this to generate a hashmap. Then it will query the regular Github API to list all of the public repos, and before sending the result back, check each repo against the hashmap to get it's true archived state.
Test plan
Tested locally against dogfood github to validate search functionality properly filters Github public archived repos now.
Config: