Skip to content

Increased DB Load and 502s After upgrading v4.72.0->v4.73.1 #33147

@rfairburn

Description

@rfairburn

Fleet version: v4.73.1

Web browser and operating system: N/A


💥  Actual behavior

The following routes had long processing times resulting in 502 errors after upgrading to v4.73.1:

/api/latest/fleet/os_versions
/api/fleet/orbit/software_install/result
/api/latest/fleet/device/<uuid>/software/install/532769

The following happened to the DB load when these routes were hit:

Image

This also looked to cause high disk io/iops on the database at the same time:

Image Image

Note: /api/latest/fleet/os_versions was on a team with over 60k hosts and the behavior was not present in v4.72.0. I am currently trying to see if this is causative or if the software installers URIs that were hit 15 minutes earlier at the time of high io were the underlying problem.

It's possible something else caused this as well. During the time of high impact, this was the query using the most load:

SELECT `osv` . `cve` , `cm` . `cvss_score` , `cm` . `epss_probability` , `cm` . `cisa_known_exploit` , `cm` . `published` AS `cve_published` , `cm` . `description` , `osv` . `resolved_in_version` , `osv` . `created_at` FROM ( SELECT `v` . `cve` , MIN ( `v` . `created_at` ) `created_at` , GROUP_CONCAT ( DISTINCTROW `v` . `resolved_in_version` SEPARATOR ? ) `resolved_in_version` FROM `operating_system_vulnerabilities` `v` JOIN `operating_systems` `os` ON `os` . `id` = `v` . `operating_system_id` A...

Query summary:

Image

During the time of the issue, container CPU looked fairly normal, but container memory increased significantly. Not to problematic levels but an abnormal amount:

Image

🛠️ To fix

TBD

Product designer: _________________________

🧑‍💻  Steps to reproduce

  1. Perform a software install on v4.73.1. Ideally with a large number of historical installs given the id of the install
  2. look at the os_versions on a team with 60k+ hosts

It looks like one or both could be causes or just symptoms.

🕯️ More info (optional)

N/A

Metadata

Metadata

Assignees

Labels

#g-orchestrationOrchestration product group:releaseReady to write code. Scheduled in a release. See "Making changes" in handbook.P2Urgent: Supported workflow not functioning as intended, newly drafted feature with urgent Fleet needbugSomething isn't working as documentedcustomer-numa~assisting g-softwareThis is a #g-software issue that another product group is assisting~released bugThis bug was found in a stable release.

Type

No type
No fields configured for issues without a type.

Projects

Status
Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions