Composefs deleting deployment#1685
Conversation
de10dec to
883110d
Compare
crates/lib/src/cli.rs
Outdated
| Opt::ConfigDiff => get_etc_diff().await, | ||
|
|
||
| #[cfg(feature = "composefs-backend")] | ||
| Opt::DeleteDeployment { depl_id, delete } => { |
| pub bootloader: Bootloader, | ||
| /// The sha256sum of vmlinuz + initrd | ||
| /// Only `Some` for Type1 boot entries | ||
| pub boot_digest: Option<String>, |
There was a problem hiding this comment.
Instead of String let's use say https://docs.rs/oci-spec/latest/oci_spec/image/struct.Sha256Digest.html for stuff like this?
Though that might be a bit harder with the jsonschema
There was a problem hiding this comment.
Yeah, I think that's a good idea
There was a problem hiding this comment.
With the jsonschema it's a bit tricky
| let Some(depl_to_del) = depl_to_del else { | ||
| anyhow::bail!("Deployment {deployment_id} not found"); | ||
| }; | ||
|
|
There was a problem hiding this comment.
I think there's a missing check here that refuses to delete the booted one?
There was a problem hiding this comment.
I have a delete param passed in to this function for debugging purposes. Might be helpful to just see what all will be deleted
There was a problem hiding this comment.
I removed the param, as it doesn't really make much sense
| grub_dir | ||
| .atomic_write(USER_CFG_STAGED, buffer) | ||
| .with_context(|| format!("Writing to {USER_CFG_STAGED}"))?; | ||
|
|
||
| rustix::fs::fsync(grub_dir.reopen_as_ownedfd().context("Reopening")?).context("fsync")?; |
There was a problem hiding this comment.
We could make an atomic_write_fsync helper
There was a problem hiding this comment.
Also btw there's https://docs.rs/cap-std-ext/latest/cap_std_ext/dirext/trait.CapStdExtDirExt.html#tymethod.atomic_replace_with that supports writing-as-you-go instead of building up the in memory buffer.
There was a problem hiding this comment.
Updated to use atomic_replace_with
We could make an atomic_write_fsync helper
looking at it again, we don't have any one show write + fsync. It's mostly a couple of writes then a final fsync
| .remove_dir_all(&state_dir) | ||
| .with_context(|| format!("Removing dir {state_dir:?}"))?; | ||
|
|
||
| for sha in diff { |
There was a problem hiding this comment.
This should really be an api in composefs-rs right?
Also bigger picture...I think we want to very clearly separate two things (as libostree does): managing the "GC roots" vs a GC operation.
Deleting a deployment starts with unlinking its GC root: the bootloader entry.
But thereafter we should just invoke a generic "GC" operation which traverses all active roots.
The idea here is we must support being interrupted - and we want to be idempotent.
There was a problem hiding this comment.
Agreed. There are a few failure cases that need to be handled
This should really be an api in composefs-rs right?
The deletion of unreferenced objects? It does make sense for it to be in composefs-rs. There's even a command for it, but right now it's basically a no-op.
Add a command to delete a composefs native deployment Deleting a deployment would mean, deleting the EROFS image, the bootloader entries for that deployment and deleting any objects in the composefs repository that are only referenced by said deployment. Also refactor some functions and add error contexts in some places Signed-off-by: Pragyan Poudyal <pragyanpoudyal41999@gmail.com> composefs-backend: Deleting staged deployment Signed-off-by: Pragyan Poudyal <pragyanpoudyal41999@gmail.com>
Delete the boot entries first, the image second and everything else afterwards. If we fail to delete the boot entry, then there's no point in deleting the image as the boot entry will still show, but there will be no image. We delete the objects at the end, as when we later perform a gc operation and don't find the image that references these objects, we can remove them then. The state directory shouldn't have any effect on boot if the image associated to it doesn't exist. If the staged file /run/composefs/staged-deployment does exist, but we have already deleted the staged image, the finalize service would fail but that wouldn't break anything Signed-off-by: Pragyan Poudyal <pragyanpoudyal41999@gmail.com>
883110d to
66d163e
Compare
Update the deletion of deployment to only simply delete the bootloader entries related to the deployment and then call a `gc` function, which will just get the difference between the states represented by the bootloader entries and the repository then try to reconcile everything by performing GC operation on the repository. Signed-off-by: Pragyan Poudyal <pragyanpoudyal41999@gmail.com>
66d163e to
90b88f9
Compare
Add debug logs for whatever is being deleted Remove the `delete` param from `delete_deployment` function Use `atomic_replace_with` instead of writing to a buffer then writing Signed-off-by: Pragyan Poudyal <pragyanpoudyal41999@gmail.com>
Signed-off-by: Pragyan Poudyal <pragyanpoudyal41999@gmail.com>
fce95be to
152ef25
Compare
|
Had some discussions with @allisonkarlitskaya on Matrix on how we'd want to handle gc for a composefs repo. The |
cgwalters
left a comment
There was a problem hiding this comment.
Looks generally OK, nothing blocking.
| "" | ||
| }; | ||
|
|
||
| tracing::info!("Deleting {kind}deployment '{deployment_id}'"); |
There was a problem hiding this comment.
Minor but let's use structured logging for this instead like tracing::info!("Deleting deployment: {deployment_id}", kind = kind) or so
|
|
||
| delete_depl_boot_entries(&depl_to_del, deleting_staged)?; | ||
|
|
||
| composefs_gc().await?; |
There was a problem hiding this comment.
I think the GC should be a clearly separate step in general so we can amortize its cost (e.g. "delete 3 deployments, then gc") and also skip the gc ("lazy prune storage later")
| #[fn_error_context::context("Listing EROFS images")] | ||
| fn list_erofs_images(sysroot: &Dir) -> Result<Vec<String>> { | ||
| let images_dir = sysroot | ||
| .open_dir("composefs/images") |
There was a problem hiding this comment.
Hmm there isn't an API for this in Repository?
I think what would help here is if this API actually took a Repository instance for now at least - we can reference its fd and read things directly. Same elsewehere
|
Hmm it looks like we had a timeout during install in this run only for c9s. https://github.com/bootc-dev/bootc/actions/runs/18873507377/job/53857419924?pr=1685 |
Add a command to delete a composefs native deployment
Deleting a deployment would mean, deleting the EROFS image, the
bootloader entries for that deployment and deleting any objects in the
composefs repository that are only referenced by said deployment.
Also refactor some functions and add error contexts in some places
Draft PR for now as it requires some more adjustments