Skip to content

Conversation

@jhult
Copy link

@jhult jhult commented Jan 13, 2026

Add ContextLimitError class to detect when upstream API returns context window limit errors. The error response includes a suggest_compact flag that clients can use to trigger their own conversation compaction.

When a context window limit error is received from the upstream API, the proxy now automatically compacts and retries for non-streaming requests:

  1. Generates a structured summary with 5 sections: Task Overview, Current State, Important Discoveries, Next Steps, Context to Preserve
  2. Places summary as a single assistant message (replacing entire message history)
  3. Increases max_tokens to 2000 for detailed summaries
  4. Requests <summary></summary> tag wrapping
  5. Retries the original request with the compacted conversation

This mimics client-side /compact command behavior, allowing conversations to continue seamlessly when hitting token limits.

Auto-compact retry only applies to non-streaming requests since streaming responses cannot be retried mid-stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant