-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Improved token refresh handling to address "Re-connecting" behavior #6231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@codex review |
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
| RefreshTokenError::Permanent(failed) => { | ||
| StreamAttemptError::Fatal(CodexErr::RefreshTokenFailed(failed)) | ||
| } | ||
| RefreshTokenError::Transient(other) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we retry these in manager.refresh_token()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to implement retry logic in multiple places. To do it properly, you need to apply backoff, have a hard-coded or configurable limit, etc. All of this logic is already in place in stream_responses. Unless there's a good reason to replicate it, let's leverage the existing retry logic.
| } | ||
|
|
||
| fn classify_refresh_token_failure(body: &str) -> RefreshTokenFailedError { | ||
| let code = extract_refresh_token_error_code(body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we extend try_parse_error_message to also return code? seem to be parsing the same style of json object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function covers shapes that try_parse_error_message doesn't handle.
pakrym-oai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add tests for any of this?
Currently, when the access token expires, we attempt to use the refresh token to acquire a new access token. This works most of the time. However, there are situations where the refresh token is expired, exhausted (already used to perform a refresh), or revoked. In those cases, the current logic treats the error as transient and attempts to retry it repeatedly.
This PR changes the token refresh logic to differentiate between permanent and transient errors. It also changes callers to treat the permanent errors as fatal rather than retrying them. And it provides better error messages to users so they understand how to address the problem. These error messages should also help us further understand why we're seeing examples of refresh token exhaustion.
Here is the error message in the CLI. The same text appears within the extension.
I also correct the spelling of "Re-connecting", which shouldn't have a hyphen in it.
Testing: I manually tested these code paths by adding temporary code to programmatically cause my refresh token to be exhausted (by calling the token refresh endpoint in a tight loop more than 50 times). I then simulated an access token expiration, which caused the token refresh logic to be invoked. I confirmed that the updated logic properly handled the error condition.
Note: We earlier discussed the idea of forcefully logging out the user at the point where token refresh failed. I made several attempts to do this, and all of them resulted in a bad UX. It's important to surface this error to users in a way that explains the problem and tells them that they need to log in again. We also previously discussed deleting the auth.json file when this condition is detected. That also creates problems because it effectively changes the auth status from logged in to logged out, and this causes odd failures and inconsistent UX. I think it's therefore better not to delete auth.json in this case. If the user closes the CLI or VSCE and starts it again, we properly detect that the access token is expired and the refresh token is "dead", and we force the user to go through the login flow at that time.
This should address aspects of #6191, #5679, and #5505