-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Persistence of VM stats #5984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persistence of VM stats #5984
Conversation
plugins/metrics/src/main/java/org/apache/cloudstack/api/ListVMsUsageHistoryCmd.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/cloudstack/api/response/StatsResponseTest.java
Show resolved
Hide resolved
plugins/metrics/src/test/java/org/apache/cloudstack/metrics/MetricsServiceImplTest.java
Show resolved
Hide resolved
|
Looks good @joseflauzino . Can you confirm it is ready for review and testing, please? |
|
@DaanHoogland Yes, it's ready. |
|
@blueorangutan package |
|
@blueorangutan package |
|
@joseflauzino a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 2685 |
|
@blueorangutan test |
|
@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-3412)
|
DaanHoogland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cltgm, needs some testing by third party though
|
@borisstoyanov @rohityadavcloud care to review? |
GutoVeronezi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLGTM, I pointed some minor improvements though.
I'll test it in my local lab and I'll post the results.
api/src/main/java/org/apache/cloudstack/api/ApiArgValidator.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/cloudstack/api/response/StatsResponse.java
Outdated
Show resolved
Hide resolved
plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
Show resolved
Hide resolved
|
@blueorangutan package |
|
@GutoVeronezi a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 2787 |
|
The tests were performed based on the code changes and the spec (#5935). I tested the situations with an environment without the changes (ENV-1) and another one with the changes (ENV-2). I separeted the CloudMonkey results in files to avoid posting a big comment. Situations✔️ I created a VM, waited for management to gather the information and restarted it. After restart it, ENV-1 was without the informations. ENV-2 had the last collected information. ✔️ I created a VM and verified in two management nodes. ENV-1 presented divergent informations. ENV-2 persented the same information for both nodes. ✔️ I tested some combinations of the global configs:
✔️ UI not showing stopped VMs when seeing metrics unless we filter only stopped: ❌ Analyzing the database, the column @joseflauzino Analyzing the new API's return, I think it would be better if the element |
|
@blueorangutan package |
|
@GutoVeronezi a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 3052 |
|
@blueorangutan package |
|
@joseflauzino a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 3071 |
|
I re-ran the manual tests (one of the results below). The feature seems working fine to me. Test with 15 seconds of interval and 1 minute of retention, with 2 MGMT: Note: the stats are ordered by the field The column MariaDB [cloud]> select * from mshost;
+----+----------------+---------------+-----------------------------+--------------------------------------+-------+-------------------+-----------------+--------------+---------------------+---------+-------------+
| id | msid | runid | name | uuid | state | version | service_ip | service_port | last_update | removed | alert_count |
+----+----------------+---------------+-----------------------------+--------------------------------------+-------+-------------------+-----------------+--------------+---------------------+---------+-------------+
| 1 | 90520745551922 | 1649179172554 | cloudstack-lab-management-1 | cd30ed1b-daed-4f23-bd5a-439ed609bf13 | Up | 4.17.0.0-SNAPSHOT | 192.168.201.150 | 9090 | 2022-04-05 17:38:12 | NULL | 0 |
| 3 | 90520746808830 | 1649179414017 | cloudstack-lab-management-2 | ea6b28f8-8511-43f8-bff7-03a9c3fc01d8 | Up | 4.17.0.0-SNAPSHOT | 192.168.201.151 | 9090 | 2022-04-05 17:38:11 | NULL | 0 |
+----+----------------+---------------+-----------------------------+--------------------------------------+-------+-------------------+-----------------+--------------+---------------------+---------+-------------+
2 rows in set (0.001 sec)
MariaDB [cloud]> select * from vm_stats;
+-----+-------+----------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | vm_id | mgmt_server_id | timestamp | vm_stats_data |
+-----+-------+----------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 136 | 113 | 1 | 2022-04-05 17:37:26 | {"vmId":113,"cpuUtilization":15.784281486353175,"networkReadKBs":0.0,"networkWriteKBs":0.0,"diskReadIOs":0.0,"diskWriteIOs":0.0,"diskReadKBs":0.0,"diskWriteKBs":0.0,"memoryKBs":524288.0,"intFreeMemoryKBs":524288.0,"targetMemoryKBs":524288.0,"numCPUs":1,"entityType":"vm"} |
| 137 | 113 | 3 | 2022-04-05 17:37:29 | {"vmId":113,"cpuUtilization":16.801075268817204,"networkReadKBs":0.0,"networkWriteKBs":0.0,"diskReadIOs":0.0,"diskWriteIOs":0.0,"diskReadKBs":0.0,"diskWriteKBs":0.0,"memoryKBs":524288.0,"intFreeMemoryKBs":524288.0,"targetMemoryKBs":524288.0,"numCPUs":1,"entityType":"vm"} |
| 138 | 113 | 1 | 2022-04-05 17:37:41 | {"vmId":113,"cpuUtilization":16.078753076292042,"networkReadKBs":0.0,"networkWriteKBs":0.0,"diskReadIOs":0.0,"diskWriteIOs":0.0,"diskReadKBs":0.0,"diskWriteKBs":0.0,"memoryKBs":524288.0,"intFreeMemoryKBs":524288.0,"targetMemoryKBs":524288.0,"numCPUs":1,"entityType":"vm"} |
| 139 | 113 | 3 | 2022-04-05 17:37:44 | {"vmId":113,"cpuUtilization":16.183412002697235,"networkReadKBs":0.0,"networkWriteKBs":0.0,"diskReadIOs":0.0,"diskWriteIOs":0.0,"diskReadKBs":0.0,"diskWriteKBs":0.0,"memoryKBs":524288.0,"intFreeMemoryKBs":524288.0,"targetMemoryKBs":524288.0,"numCPUs":1,"entityType":"vm"} |
| 140 | 113 | 1 | 2022-04-05 17:37:57 | {"vmId":113,"cpuUtilization":16.203322914953116,"networkReadKBs":0.0,"networkWriteKBs":0.0,"diskReadIOs":0.0,"diskWriteIOs":0.0,"diskReadKBs":0.0,"diskWriteKBs":0.0,"memoryKBs":524288.0,"intFreeMemoryKBs":524288.0,"targetMemoryKBs":524288.0,"numCPUs":1,"entityType":"vm"} |
| 141 | 113 | 3 | 2022-04-05 17:38:00 | {"vmId":113,"cpuUtilization":16.87624090006618,"networkReadKBs":0.0,"networkWriteKBs":0.0,"diskReadIOs":0.0,"diskWriteIOs":0.0,"diskReadKBs":0.0,"diskWriteKBs":0.0,"memoryKBs":524288.0,"intFreeMemoryKBs":524288.0,"targetMemoryKBs":524288.0,"numCPUs":1,"entityType":"vm"} |
| 142 | 113 | 1 | 2022-04-05 17:38:12 | {"vmId":113,"cpuUtilization":16.1276982879828,"networkReadKBs":0.0,"networkWriteKBs":0.0,"diskReadIOs":0.0,"diskWriteIOs":0.0,"diskReadKBs":0.0,"diskWriteKBs":0.0,"memoryKBs":524288.0,"intFreeMemoryKBs":524288.0,"targetMemoryKBs":524288.0,"numCPUs":1,"entityType":"vm"} |
| 143 | 113 | 3 | 2022-04-05 17:38:15 | {"vmId":113,"cpuUtilization":14.468924695823743,"networkReadKBs":0.0,"networkWriteKBs":0.0,"diskReadIOs":0.0,"diskWriteIOs":0.0,"diskReadKBs":0.0,"diskWriteKBs":0.0,"memoryKBs":524288.0,"intFreeMemoryKBs":524288.0,"targetMemoryKBs":524288.0,"numCPUs":1,"entityType":"vm"} |
+-----+-------+----------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
8 rows in set (0.001 sec)@nvazquez can we re-run the smoke tests for this one? |
|
Thanks @GutoVeronezi, sure |
|
@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-3826)
|
|
@GutoVeronezi @nvazquez @DaanHoogland |
|
At this stage, i'd leave that with the RM, @nvazquez ?
|
|
@joseflauzino sorry, have deleted my previous comment - have seen its been manually tested - can be merged |
|
Hi @joseflauzino - does the new API return the exact same API output format/keys as the deprecated listVirtualMachinesMetrics with this PR? |
- Changes behaviour of details param handling for:
- listVirtualMachines API: when the detail param is not provided, it
uses `all` details except `stats`
- listVirtualMachinesMetrics API: when the detail param is not
provided, it uses `all` details including `stats`
- Remove ConfigKey vm.stats.increment.metrics.in.memory which was
renamed to `vm.stats.increment.metrics` in apache#5984
- Changes default value of VM stats accumulation setting
`vm.stats.increment.metrics` to false until a better solution emerges.
Since apache#5984, this is true and during the execution of listVM APIs the
stats are clubbed/calculated which can immensely slow down list VM API
calls.
- Fix UI that uses listVirtualMachinesMetrics to not call `stats` detail
when in list view without metrics selected.
These changes see 2-4x gains in the listVirtualMachines APIs call and in
the UI. For environment where code changes are not possible, disabling
`vm.stats.increment.metrics` global setting saw 2-4x speed gain in the
list API calls.
Signed-off-by: Rohit Yadav <[email protected]>
- Changes behaviour of details param handling for:
- listVirtualMachines API: when the detail param is not provided, it
uses `all` details except `stats`
- listVirtualMachinesMetrics API: when the detail param is not
provided, it uses `all` details including `stats`
- Remove ConfigKey vm.stats.increment.metrics.in.memory which was
renamed to `vm.stats.increment.metrics` in apache#5984
- Changes default value of VM stats accumulation setting
`vm.stats.increment.metrics` to false until a better solution emerges.
Since apache#5984, this is true and during the execution of listVM APIs the
stats are clubbed/calculated which can immensely slow down list VM API
calls.
- Fix UI that uses listVirtualMachinesMetrics to not call `stats` detail
when in list view without metrics selected.
These changes see 2-4x gains in the listVirtualMachines APIs call and in
the UI. For environment where code changes are not possible, disabling
`vm.stats.increment.metrics` global setting saw 2-4x speed gain in the
list API calls.
Signed-off-by: Rohit Yadav <[email protected]>
- Changes behaviour of details param handling for:
- listVirtualMachines API: when the detail param is not provided, it
uses `all` details except `stats`
- listVirtualMachinesMetrics API: when the detail param is not
provided, it uses `all` details including `stats`
- Remove ConfigKey vm.stats.increment.metrics.in.memory which was
renamed to `vm.stats.increment.metrics` in apache#5984
- Changes default value of VM stats accumulation setting
`vm.stats.increment.metrics` to false until a better solution emerges.
Since apache#5984, this is true and during the execution of listVM APIs the
stats are clubbed/calculated which can immensely slow down list VM API
calls.
- Fix UI that uses listVirtualMachinesMetrics to not call `stats` detail
when in list view without metrics selected.
These changes see 2-4x gains in the listVirtualMachines APIs call and in
the UI. For environment where code changes are not possible, disabling
`vm.stats.increment.metrics` global setting saw 2-4x speed gain in the
list API calls.
Signed-off-by: Rohit Yadav <[email protected]>
- Changes behaviour of details param handling via global setting: - listVirtualMachines API: when the details param is not provided, it returns `all` details excluding/including `stats` which is controllable by a new global setting `list.vm.default.details.stats` - listVirtualMachinesMetrics API: when the details param is not provided, it uses `all` details including `stats` - Users who are affected by the stats related change, can have backward compatibility at the higher-cost of listVirtualMachines API response time by setting `list.vm.default.details.stats` to true - Remove ConfigKey vm.stats.increment.metrics.in.memory which was renamed to `vm.stats.increment.metrics` in apache#5984 and also remove unused/unnecessary global settings via upgrade path - Changes default value of VM stats accumulation setting `vm.stats.increment.metrics` to false until a better solution emerges. Since apache#5984, this is true and during the execution of listVM APIs the stats are clubbed/calculated which can immensely slow down list VM API calls. - Fix UI that uses listVirtualMachinesMetrics to not call `stats` detail when in list view without metrics selected. Signed-off-by: Rohit Yadav <[email protected]>
#9177) - Changes behaviour of details param handling via global setting: - listVirtualMachines API: when the details param is not provided, it returns whether stats are returned controlled by a new global setting `list.vm.default.details.stats` - listVirtualMachinesMetrics API: when the details param is not provided, it uses `all` details including `stats` - Users who are affected slow performance of the listVirtualMachines API response time can set `list.vm.default.details.stats` to `false` - Remove ConfigKey vm.stats.increment.metrics.in.memory which was renamed to `vm.stats.increment.metrics` in #5984 and also remove unused/unnecessary global settings via upgrade path - Changes default value of VM stats accumulation setting `vm.stats.increment.metrics` to false until a better solution emerges. Since #5984, this is true and during the execution of listVM APIs the stats are clubbed/calculated which can immensely slow down list VM API calls. Any costly operations such as summing of stats shouldn't be done during the course of a synchronous API, such as the list VM API. - Fix UI that uses listVirtualMachinesMetrics to not call `stats` detail when in list view without metrics selected. Signed-off-by: Rohit Yadav <[email protected]>
apache#9177) - Changes behaviour of details param handling via global setting: - listVirtualMachines API: when the details param is not provided, it returns whether stats are returned controlled by a new global setting `list.vm.default.details.stats` - listVirtualMachinesMetrics API: when the details param is not provided, it uses `all` details including `stats` - Users who are affected slow performance of the listVirtualMachines API response time can set `list.vm.default.details.stats` to `false` - Remove ConfigKey vm.stats.increment.metrics.in.memory which was renamed to `vm.stats.increment.metrics` in apache#5984 and also remove unused/unnecessary global settings via upgrade path - Changes default value of VM stats accumulation setting `vm.stats.increment.metrics` to false until a better solution emerges. Since apache#5984, this is true and during the execution of listVM APIs the stats are clubbed/calculated which can immensely slow down list VM API calls. Any costly operations such as summing of stats shouldn't be done during the course of a synchronous API, such as the list VM API. - Fix UI that uses listVirtualMachinesMetrics to not call `stats` detail when in list view without metrics selected. Signed-off-by: Rohit Yadav <[email protected]>


Description
This PR addresses issue #5935.
In summary, this PR makes ACS persist VM stats in the database to provide more consistency and make it possible to obtain historical data.
For more information, please refer to the spec contained in issue #5935.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Screenshots
New API
Getting all available stats:

Filtering by start and end date:

UI changes
Metrics disabled. It shows all VMs:

Metrics enabled. It shows only running VMs and their current stats:

Metrics enabled and filtering by stopped VMs. It forces the display of stopped VMs and their latest stats collected (not reflecting their current stats):

How Has This Been Tested?
I added unit tests to several methods.
Also, in a local lab, I performed the following tests:
vm.stats.increment.metrics,vm.stats.max.retention.time, andvm.stats.interval. All tests resulted in the expected behavior, as described in section 2.2 of the spec.accumulateparameter with the global configurationvm.stats.increment.metrics, and everything worked as expected. See sections 2.2 and 3.3 of the spec for more details.