-
Notifications
You must be signed in to change notification settings - Fork 905
Increased DB Load and 502s After upgrading v4.72.0->v4.73.1 #33147
Copy link
Copy link
Closed
Labels
#g-orchestrationOrchestration product groupOrchestration product group:releaseReady to write code. Scheduled in a release. See "Making changes" in handbook.Ready to write code. Scheduled in a release. See "Making changes" in handbook.P2Urgent: Supported workflow not functioning as intended, newly drafted feature with urgent Fleet needUrgent: Supported workflow not functioning as intended, newly drafted feature with urgent Fleet needbugSomething isn't working as documentedSomething isn't working as documentedcustomer-numa~assisting g-softwareThis is a #g-software issue that another product group is assistingThis is a #g-software issue that another product group is assisting~released bugThis bug was found in a stable release.This bug was found in a stable release.
Milestone
Metadata
Metadata
Assignees
Labels
#g-orchestrationOrchestration product groupOrchestration product group:releaseReady to write code. Scheduled in a release. See "Making changes" in handbook.Ready to write code. Scheduled in a release. See "Making changes" in handbook.P2Urgent: Supported workflow not functioning as intended, newly drafted feature with urgent Fleet needUrgent: Supported workflow not functioning as intended, newly drafted feature with urgent Fleet needbugSomething isn't working as documentedSomething isn't working as documentedcustomer-numa~assisting g-softwareThis is a #g-software issue that another product group is assistingThis is a #g-software issue that another product group is assisting~released bugThis bug was found in a stable release.This bug was found in a stable release.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Projects
StatusShow more project fields
Done
Fleet version: v4.73.1
Web browser and operating system: N/A
💥 Actual behavior
The following routes had long processing times resulting in 502 errors after upgrading to v4.73.1:
The following happened to the DB load when these routes were hit:
This also looked to cause high disk io/iops on the database at the same time:
Note:
/api/latest/fleet/os_versionswas on a team with over 60k hosts and the behavior was not present in v4.72.0. I am currently trying to see if this is causative or if the software installers URIs that were hit 15 minutes earlier at the time of high io were the underlying problem.It's possible something else caused this as well. During the time of high impact, this was the query using the most load:
Query summary:
During the time of the issue, container CPU looked fairly normal, but container memory increased significantly. Not to problematic levels but an abnormal amount:
🛠️ To fix
TBD
Product designer: _________________________
🧑💻 Steps to reproduce
It looks like one or both could be causes or just symptoms.
🕯️ More info (optional)
N/A