Report rollbacks as Omaha error codes#6
Conversation
When the system failed to boot into an update, a rollback to the previous partition took place. However, an update success was reported via Omaha. Add a new error code that is reported for rollbacks. The detection is done by comparing the currently running version number with the previous version number that is already stored for reportting also serves as a way to detect if a reboot is needed.
5422422 to
aa44262
Compare
|
Would it make sense to add a test for this to https://github.com/flatcar-linux/mantle/blob/flatcar-master/kola/tests/update/update.go ? |
|
Yes, the update tests should be expanded, too. |
krnowak
left a comment
There was a problem hiding this comment.
Some questions about the code. I also checked if nebraska will actually register this error, because it only accepts certain pairs of error type and error result (see nebraska db entries and event registration query). Looks like it will, we send type 3, result 0, which is inserted into the database.
| // a rollback took place. | ||
| // We do not need to check for UPDATE_STATUS_UPDATED_NEED_REBOOT because a new update won't | ||
| // be attempted until the system rebooted. | ||
| if (!prev_version.empty() && prev_version.compare(current_version) == 0) { |
There was a problem hiding this comment.
Is prev_version ever going to be empty? When we fail to get it from preferences, we set it to "initial".
There was a problem hiding this comment.
Also, will this branch be executed every time I restart the machine (for whatever reason, not only because of the update) and boot the partition with the previous version? Should we make a check that the update is in progress?
There was a problem hiding this comment.
"initial" should be the value when no file exists while "" should be the value for normal operation.
Maybe I should introduce another boolean to take care of this? Now I relied on the assumption that "initial" won't be the current version.
There was a problem hiding this comment.
The previous version file is only created when an update was applied. This should not alert a rollback in the case of normal reboots with selecting a different partition manually. (And even if it would fire I think it would be up for discussion if we want to distinguish between automatic and manual rollbacks. Currently I wanted to focus on the first reboot after an update.)
There was a problem hiding this comment.
"initial"should be the value when no file exists while""should be the value for normal operation.
Maybe I should introduce another boolean to take care of this? Now I relied on the assumption that"initial"won't be the current version.
Nah, I'm just not familiar with the code. I was just assuming that this code would also run every time you reboot. And apparently that was a wrong assumption, so let's leave it as is.
The previous version file is only created when an update was applied. This should not alert a rollback in the case of normal reboots with selecting a different partition manually.
Thanks, I did not know it.
(And even if it would fire I think it would be up for discussion if we want to distinguish between automatic and manual rollbacks. Currently I wanted to focus on the first reboot after an update.)
Let's have it some other time.
This pulls in flatcar/update_engine#6 for rollback detection.
This pulls in flatcar/update_engine#6 for rollback detection.
When the system failed to boot into an update, a rollback to the
previous partition took place. However, an update success was reported
via Omaha.
Add a new error code that is reported for rollbacks. The detection is
done by comparing the currently running version number with the
previous version number that is already stored for reportting also
serves as a way to detect if a reboot is needed.
How to use
Build an image with this patch. Then run the following after booting it as VM:
Now force an update:
When done, run
poweroffand start the VM, but be quick to select the A partition in GRUB and not the default.Now the rollback should be detected and reported instead of an update success (wait 45 seconds).
Review the output of
journalctl --all -u update-engine -e.Testing done
The above and some variations like restarting the service during an update download or after the update is prepared to make sure that no false alarm is triggered.