Hey!
We tried to create an ADK agent that will act as a chatbot, that will use SQL tools to basically answer to human based on informations it will find in BigQuery.
We deploy in GKE but we use the session and memory features from VertexAI Agent Engine.
It works fine on our tests and when we try it “by hand”. But when it comes to load tests, everything breaks.
We set a memory limit to 1Gi by pod, but we noticed that at startup the pod use 500Mi, then we consume ~14Mb (RSS) per call to /run_sse (they will never be recovered. (We are talking about “Hello” prompts we don’t have a full context there).
It completely breaks the feature since it will never scale. Yes we can deploy it in Cloud Run or Agent Engine but we will have a cost issue after some time (We don’t want to pay big instances for a service that basically just do some HTTP calls, the agent is not the pod that does the SQL query, we use an MCP for that).
I guess we’re doing something wrong but I have no clue what exactly. We can use some workaround with Kube’s probes to make it restart when the memory pressure is too high but we’d like to address the root issue.