Skip to content

Comments

Report rollbacks as Omaha error codes#6

Merged
pothos merged 1 commit intoflatcar-masterfrom
kai/report-rollback
Sep 7, 2020
Merged

Report rollbacks as Omaha error codes#6
pothos merged 1 commit intoflatcar-masterfrom
kai/report-rollback

Conversation

@pothos
Copy link
Member

@pothos pothos commented Aug 27, 2020

When the system failed to boot into an update, a rollback to the
previous partition took place. However, an update success was reported
via Omaha.
Add a new error code that is reported for rollbacks. The detection is
done by comparing the currently running version number with the
previous version number that is already stored for reportting also
serves as a way to detect if a reboot is needed.

How to use

Build an image with this patch. Then run the following after booting it as VM:

sudo -s
echo "GROUP=stable" > /etc/flatcar/update.conf
# optional: use your own Nebraska instance:
# echo "SERVER=http://192.168.1.5:8000/v1/update" >> /etc/flatcar/update.conf
curl -L -o /tmp/key https://raw.githubusercontent.com/flatcar-linux/coreos-overlay/flatcar-master/coreos-base/coreos-au-key/files/official-v2.pub.pem
sudo mount --bind /tmp/key /usr/share/update_engine/update-payload-key.pub.pem
cp /usr/share/flatcar/release /tmp/release
sed -i "/FLATCAR_RELEASE_VERSION=.*/d" /tmp/release
echo FLATCAR_RELEASE_VERSION=0.0.1 >> /tmp/release
mount --bind /tmp/release /usr/share/flatcar/release
systemctl restart update-engine

Now force an update:

update_engine_client -update

When done, run poweroff and start the VM, but be quick to select the A partition in GRUB and not the default.
Now the rollback should be detected and reported instead of an update success (wait 45 seconds).
Review the output of journalctl --all -u update-engine -e.

Testing done

The above and some variations like restarting the service during an update download or after the update is prepared to make sure that no false alarm is triggered.

@pothos pothos requested a review from a team August 27, 2020 17:02
When the system failed to boot into an update, a rollback to the
previous partition took place. However, an update success was reported
via Omaha.
Add a new error code that is reported for rollbacks. The detection is
done by comparing the currently running version number with the
previous version number that is already stored for reportting also
serves as a way to detect if a reboot is needed.
@pothos pothos force-pushed the kai/report-rollback branch from 5422422 to aa44262 Compare August 27, 2020 17:13
@t-lo
Copy link
Member

t-lo commented Aug 31, 2020

Would it make sense to add a test for this to https://github.com/flatcar-linux/mantle/blob/flatcar-master/kola/tests/update/update.go ?

@pothos
Copy link
Member Author

pothos commented Aug 31, 2020

Yes, the update tests should be expanded, too.

Copy link
Member

@krnowak krnowak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions about the code. I also checked if nebraska will actually register this error, because it only accepts certain pairs of error type and error result (see nebraska db entries and event registration query). Looks like it will, we send type 3, result 0, which is inserted into the database.

// a rollback took place.
// We do not need to check for UPDATE_STATUS_UPDATED_NEED_REBOOT because a new update won't
// be attempted until the system rebooted.
if (!prev_version.empty() && prev_version.compare(current_version) == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is prev_version ever going to be empty? When we fail to get it from preferences, we set it to "initial".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, will this branch be executed every time I restart the machine (for whatever reason, not only because of the update) and boot the partition with the previous version? Should we make a check that the update is in progress?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"initial" should be the value when no file exists while "" should be the value for normal operation.
Maybe I should introduce another boolean to take care of this? Now I relied on the assumption that "initial" won't be the current version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous version file is only created when an update was applied. This should not alert a rollback in the case of normal reboots with selecting a different partition manually. (And even if it would fire I think it would be up for discussion if we want to distinguish between automatic and manual rollbacks. Currently I wanted to focus on the first reboot after an update.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"initial" should be the value when no file exists while "" should be the value for normal operation.
Maybe I should introduce another boolean to take care of this? Now I relied on the assumption that "initial" won't be the current version.

Nah, I'm just not familiar with the code. I was just assuming that this code would also run every time you reboot. And apparently that was a wrong assumption, so let's leave it as is.

The previous version file is only created when an update was applied. This should not alert a rollback in the case of normal reboots with selecting a different partition manually.

Thanks, I did not know it.

(And even if it would fire I think it would be up for discussion if we want to distinguish between automatic and manual rollbacks. Currently I wanted to focus on the first reboot after an update.)

Let's have it some other time.

@pothos pothos merged commit ae63e23 into flatcar-master Sep 7, 2020
@pothos pothos deleted the kai/report-rollback branch September 7, 2020 16:47
pothos added a commit to flatcar-archive/coreos-overlay that referenced this pull request Sep 7, 2020
pothos added a commit to flatcar-archive/coreos-overlay that referenced this pull request Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants