-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Speed up Databricks provider init by removing fetch of supported models #6616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c848b8c to
c67af95
Compare
|
Thanks! I tested the desktop app locally and its working well and does seem faster to load a chat. I also tested the e2e changes and there are still failing tests for me but overall its an improvement so I don't think it should block this PR and we can follow up with fixing the remaining tests. A major change is it launches the electron app from scratch with each test whereas before it would use the same running electron app so that has tradeoffs like taking longer to run the entire suite. Ultimately I think we should pair these tests down to only what we need for a release smoke test from our checklist so we can automate them on releases next. The rest of the functionality can be covered by our vitest unit/functional tests. |
Yeah, one advantage of the isolation is that we could potentially allow playwright to run the tests in parallel without the tests interfering with each other. It is necessary for the performance test to have a fresh instance to give more repeatable results (so that the test is not sometimes testing a cold launch and sometimes a warm instance) but we could allow the other tests to reuse instances if desired. |
DOsinga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd give the LLM generated comments a look over and drop some but otherwise looks good
| try { | ||
| // Assign a unique debug port for this test to enable parallel execution | ||
| // Base port 9222, offset by worker index * 100 + parallel slot | ||
| const debugPort = 9222 + (testInfo.parallelIndex * 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not just find a free port here? we're in node land, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're not the ones binding the port, that's happening in electron after we spawn it (but we need to choose it here so that we know the port to connect to later).
so, even if we wrote the code here to find a free port, there'd be a race condition as someone could bind it before electron starts up and devtools binds it. I'm not sure it would be overall better.
I wanted to use Playwright to take some performance measurements as part of making the time to first token faster on first use. The existing tests were stale so this updates them to work with the UI as it is currently. I also extracted the setup and teardown into a fixture.
This takes >1s and we are only using it to determine if the "fast model" used for compacting and for generating the titles for sessions (currently Gemini 2.5 Flash) is available. We can instead assume that it's available, because there is already a fallback behaviour implemented if we try to use it and it's not available.
c67af95 to
a90e0b0
Compare
* origin/main: fix: dispatch ADD_ACTIVE_SESSION event before navigating from "View All" (#6679) Speed up Databricks provider init by removing fetch of supported models (#6616) fix: correct typos in documentation and Justfile (#6686) docs: frameDomains and baseUriDomains for mcp apps (#6684) docs: add Remotion video creation tutorial (#6675) docs: export recipe and copy yaml (#6680) Test against fastmcp (#6666) docs: mid-session changes (#6672) Fix MCP elicitation deadlock and improve UX (#6650) chore: upgrade to rmcp 0.14.0 (#6674) [docs] add MCP-UI to MCP Apps blog (#6664) ACP get working dir from args.cwd (#6653) Optimise load config in UI (#6662) # Conflicts: # ui/desktop/src/components/Layout/AppLayout.tsx
…upport * origin/main: (79 commits) fix[format/openai]: return error on empty msg. (#6511) Fix: ElevenLabs API Key Not Persisting (#6557) Logging uplift for model training purposes (command injection model) [Small change] (#6330) fix(goose): only send agent-session-id when a session exists (#6657) BERT-based command injection detection in tool calls (#6599) chore: [CONTRIBUTING.md] add Hermit to instructions (#6518) fix: update Gemini context limits (#6536) Document r slash command (#6724) Upgrade GitHub Actions to latest versions (#6700) fix: Manual compaction does not update context window. (#6682) Removed the Acceptable Usage Policy (#6204) Document spellcheck toggle (#6721) fix: docs workflow cleanup and prevent cancellations (#6713) Docs: file bug directly (#6718) fix: dispatch ADD_ACTIVE_SESSION event before navigating from "View All" (#6679) Speed up Databricks provider init by removing fetch of supported models (#6616) fix: correct typos in documentation and Justfile (#6686) docs: frameDomains and baseUriDomains for mcp apps (#6684) docs: add Remotion video creation tutorial (#6675) docs: export recipe and copy yaml (#6680) ... # Conflicts: # ui/desktop/src/hooks/useChatStream.ts
…ovider * 'main' of github.com:block/goose: fix: Manual compaction does not update context window. (#6682) Removed the Acceptable Usage Policy (#6204) Document spellcheck toggle (#6721) fix: docs workflow cleanup and prevent cancellations (#6713) Docs: file bug directly (#6718) fix: dispatch ADD_ACTIVE_SESSION event before navigating from "View All" (#6679) Speed up Databricks provider init by removing fetch of supported models (#6616) fix: correct typos in documentation and Justfile (#6686) docs: frameDomains and baseUriDomains for mcp apps (#6684)
Summary
The list of supported models was being fetched from Databricks every time the provider was initialised, taking around 1.6s on my machine. This shows up to the user as a 1.6s delay before the first model response can start to be received.
We only needed this list to determine if our preferred "fast model", used for compacting and for generating the titles of sessions (currently Gemini 2.5 Flash) was supported. If it isn't supported, we fall back to the main model. But we already have a fallback to the main model implemented if calling the fast model fails (e.g. because it's not found), so there's no need to check upfront.
This PR removes the fetch of supported models and assumes the fast model is available, relying on the already implemented fallback to the main model if it turns out not to be. On my machine, it reduces the time to first token after sending the first prompt from 1.6s to around 200ms.
I also updated the Playwright tests (which were failing due to drift between the actual UI structure and what the tests expected) and added some performance tests which were used to find the issue being fixed in the PR.
It may be easier to review by commit.
Type of Change
AI Assistance
Testing
I manually tested to verify the performance improvement.