internal/exec/stages/disks: prevent races with udev#1319
Conversation
|
FYI, Flatcar uses this already for a while now |
jlebon
left a comment
There was a problem hiding this comment.
Thanks for upstreaming this and writing a detailed commit message! Always fun to debug udev races. :)
Had a minor comment, but this looks sane to me overall.
One concern is that we're calling this on every separate filesystem/disk/LUKS/RAID device. Would it be more efficient to instead do it only once per section for the last device in each associated list?
848f20a to
716cec5
Compare
That would rely on inotify for the other devices and I think there is no guarantee that the inotify event goes into the queue and gets fully processed if we trigger a tagged event and wait for this event's completion - maybe if we check all implementation details we know that it works for the current udev implementation but I would rather be on the safe side since absence of the race condition is not verifiable through testing… |
d14a76a to
fba4ff4
Compare
|
I won't be able to look at this for some weeks, unfortunately, but it's still on my list and I'll circle back as soon as I can. |
|
Your reasoning makes sense to me, but I haven't thought it through carefully. I'll defer to re-review from @jlebon, which actually I should have done months ago. 😳 Thanks for your patience. PR needs rebase to pick up CI changes. |
d12147c to
3c6297c
Compare
|
I added a release note entry as required |
|
@jlebon Can you do the approval (actually, you did already, not sure you need to submit again) and hit the merge button? |
The "udevadm settle" command used to wait for udev to process the disk changes and recreate the entries under /dev was still prone to races where udev didn't get notified yet of the final event to wait for. This caused the boot with a btrfs root filesystem created by Ignition to fail almost every time on certain hardware. Issue tagged events and wait for them to be processed by udev. This is actually meanigful in all stages not only for the other parts of the initramfs which may be surprised by sudden device nodes disappearing shortly like the case was with systemd's fsck service but also for the inter-stage dependencies which currently are using the waiter for systemd device units but that doesn't really prevent from races with udev device node recreation. Thus, these changes are complementary to the existing waiter which mainly has the purpose to wait for unmodified devices. For newly created RAIDs we can wait for the new node to be available as udev will not recreate it.
3c6297c to
3f78465
Compare
jlebon
left a comment
There was a problem hiding this comment.
Thanks for this. It (still) makes sense to me and in fact we do something similar in other places of the stack for similar reasons. I'm pretty sure we should be able to drop some of those with this.
I just tweaked the error messages and the release note item. Let me know if you disagree with some of them.
The "udevadm settle" command used to wait for udev to process the disk
changes and recreate the entries under /dev was still prone to races
where udev didn't get notified yet of the final event to wait for.
This caused the boot with a btrfs root filesystem created by Ignition
to fail almost every time on certain hardware.
Issue tagged events and wait for them to be processed by udev. This is
actually meanigful in all stages not only for the other parts of the
initramfs which may be surprised by sudden device nodes disappearing
shortly like the case was with systemd's fsck service but also for the
inter-stage dependencies which currently are using the waiter for
systemd device units but that doesn't really prevent from races with
udev device node recreation. Thus, these changes are complementary to
the existing waiter which mainly has the purpose to wait for unmodified
devices. For newly created RAIDs we can wait for the new node to be
available as udev will not recreate it.