fs: Add read_container_root() and copy metadata from /usr -> /#209
fs: Add read_container_root() and copy metadata from /usr -> /#209cgwalters merged 6 commits intocomposefs:mainfrom
Conversation
ddcf1ce to
6b05097
Compare
allisonkarlitskaya
left a comment
There was a problem hiding this comment.
Didn't look at the code but this definitely requires an update to https://github.com/containers/composefs-rs/blob/main/doc/oci.md
6b05097 to
3d33e70
Compare
|
Thanks, sorry I forgot about that. I've now ensured that's cross-referenced more strongly in the code too. I added a followup commit here to make even more clear that the "processing OCI layers" is the weird case; from a dumpfile we construct the fs with the root stat metadata that we know is there from the start. That said...I was thinking about this more and realized we still have at least one other big issue in this area that's representative: Right now if SELinux is enabled at build time, we'll pick up the More generally I think there are also corner cases where host-side xattrs can visibly leak into the container runtime just because of how overlayfs works. This one relates to containers/storage#1608 (comment) pretty strongly. I guess for now, the simplest fix might be:
|
3d33e70 to
ff67094
Compare
|
Added another change to take a much stronger stance around xattrs. |
Add a method to visit every Stat in the filesystem tree. This provides a reusable primitive for operations that need to traverse all nodes, such as filtering xattrs or stripping SELinux labels. Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
OCI container layer tars often don't include a root directory entry, and when they do, container runtimes (Podman, Docker) ignore it and use hardcoded defaults. This makes root metadata non-deterministic, causing composefs digest mismatches. For detailed analysis of this issue, see: https://gist.github.com/cgwalters/c18c9337aa9345d763aa446cc95c7847 containers/storage#743 Since the metadata of `/` is ill-defined in the OCI spec, we default to copying it from `/usr` - no one should care if these differ, except for SELinux labels which we handle separately in the boot path. Add new APIs: - copy_root_metadata_from_usr(): copies mode, uid, gid, mtime, xattrs from /usr to root directory - read_container_root(): wrapper that calls read_filesystem() then copy_root_metadata_from_usr() Remove the incomplete have_root_stat/ensure_root_stat() mechanism which only handled mtime, not permissions or other metadata. Update cfsctl commands (compute-id, create-image, create-dumpfile) to use read_container_root() by default. Add --no-propagate-usr-to-root flag to use raw read_filesystem() when root metadata is well-defined. Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
Followup to the previous commit. In most use cases we are starting from a fully populated filesystem - there's no need to hardcode permissions. The main exception to this is when we're processing OCI tar streams. Instead of using mode 0o555 for this and arbitrarily match podman (but not docker) let's hardcode mode 0 instead which is even more obviously uninitialized. We again now expect OCI use cases to do the "propagate attrs from /usr" approach. Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
…SELinux targets When reading a container filesystem from a mounted root, host-side xattrs can leak into the image. This is particularly problematic for SELinux labels like `container_t` that come from the build host's overlayfs, breaking reproducibility when targeting non-SELinux systems. Add xattr filtering with a predicate-based approach: - `read_filesystem_filtered()` takes a closure to filter xattrs - `read_container_root()` uses `is_allowed_container_xattr()` which checks against `CONTAINER_XATTR_ALLOWLIST` (currently `security.capability`) - `FileSystem::filter_xattrs()` provides the underlying filtering using the new `for_each_stat()` helper Additionally, fix `transform_for_boot()` SELinux handling: - If no SELinux policy is found in the target filesystem, strip all `security.selinux` xattrs instead of leaving build-time labels in place - `selabel()` now returns `bool` indicating whether labeling was performed This ensures: - Build-time SELinux labels don't leak into non-SELinux targets - SELinux-enabled targets get correct labels from their own policy - Other host xattrs (overlayfs internals, etc.) don't pollute the image See: containers/storage#1608 (comment) Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
ff67094 to
e1dd239
Compare
See composefs/composefs-rs#209 Signed-off-by: Colin Walters <walters@verbum.org> Assisted-by: OpenCode (Opus 4.5)
See composefs/composefs-rs#209 Signed-off-by: Colin Walters <walters@verbum.org> Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
Empty /run in create_filesystem() when processing OCI images. This is
a tmpfs at runtime and should always be empty in images.
Emptying /run also works around podman/buildah's RUN --mount issue where
cache mounts can leave incomplete directory entries in OCI tar layers,
causing mtime inconsistencies. Container authors can now safely use:
RUN rm -rf /var/cache/dnf && ln -sr /run/dnfcache /var/cache/dnf
RUN --mount=type=cache,target=/run/dnfcache dnf install -y ...
The /run directory is emptied for all OCI images, not just bootable ones,
since it should always be empty regardless of boot configuration.
Also change emptied directories in transform_for_boot() (/boot, /sysroot)
to use /usr's mtime instead of 0, for consistency with how root directory
metadata is handled via copy_root_metadata_from_usr().
Closes: composefs#132
Assisted-by: OpenCode (Opus 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
e1dd239 to
c7257e9
Compare
|
Now on I don't have a really strong opinion on erroring versus emptying, the main reason I chose emptying is just because we already were doing that for |
| `container_t` that come from the build host, not from the target system's | ||
| policy. | ||
|
|
||
| To ensure reproducibility, `read_container_root()` filters xattrs to only |
There was a problem hiding this comment.
From live chat: a TODO is to expose configurability for this on the CLI in case anyone wants to allowlist other xattrs for $reasons.
jeckersb
left a comment
There was a problem hiding this comment.
👍 LGTM, we went over this in moderate detail in a live chat
|
@allisonkarlitskaya this needs signoff on your requested update to |
The composefs-rs PR 209 has been merged to main. This updates bootc to use the containers/composefs-rs repository at the merge commit. Key API changes: - Directory::default() -> Directory::new(Stat::uninitialized()) - read_filesystem() no longer takes stat_root parameter - New read_container_root() for OCI containers (propagates /usr metadata to root) - stat_root CLI flag renamed to no_propagate_usr_to_root with inverted logic See composefs/composefs-rs#209 Signed-off-by: Colin Walters <walters@verbum.org>
The composefs-rs PR 209 has been merged to main. This updates bootc to use the containers/composefs-rs repository at the merge commit. Key API changes: - Directory::default() -> Directory::new(Stat::uninitialized()) - read_filesystem() no longer takes stat_root parameter - New read_container_root() for OCI containers (propagates /usr metadata to root) - stat_root CLI flag renamed to no_propagate_usr_to_root with inverted logic See composefs/composefs-rs#209 Signed-off-by: Colin Walters <walters@verbum.org>
The composefs-rs PR 209 has been merged to main. This updates bootc to use the containers/composefs-rs repository at the merge commit. Key API changes: - Directory::default() -> Directory::new(Stat::uninitialized()) - read_filesystem() no longer takes stat_root parameter - New read_container_root() for OCI containers (propagates /usr metadata to root) - stat_root CLI flag renamed to no_propagate_usr_to_root with inverted logic See composefs/composefs-rs#209 Signed-off-by: Colin Walters <walters@verbum.org>
The composefs-rs PR 209 has been merged to main. This updates bootc to use the containers/composefs-rs repository at the merge commit. Key API changes: - Directory::default() -> Directory::new(Stat::uninitialized()) - read_filesystem() no longer takes stat_root parameter - New read_container_root() for OCI containers (propagates /usr metadata to root) - stat_root CLI flag renamed to no_propagate_usr_to_root with inverted logic See composefs/composefs-rs#209 Signed-off-by: Colin Walters <walters@verbum.org>
The composefs-rs PR 209 has been merged to main. This updates bootc to use the containers/composefs-rs repository at the merge commit. Key API changes: - Directory::default() -> Directory::new(Stat::uninitialized()) - read_filesystem() no longer takes stat_root parameter - New read_container_root() for OCI containers (propagates /usr metadata to root) - stat_root CLI flag renamed to no_propagate_usr_to_root with inverted logic See composefs/composefs-rs#209 Signed-off-by: Colin Walters <walters@verbum.org>
The composefs-rs PR 209 has been merged to main. This updates bootc to use the containers/composefs-rs repository at the merge commit. Key API changes: - Directory::default() -> Directory::new(Stat::uninitialized()) - read_filesystem() no longer takes stat_root parameter - New read_container_root() for OCI containers (propagates /usr metadata to root) - stat_root CLI flag renamed to no_propagate_usr_to_root with inverted logic See composefs/composefs-rs#209 Signed-off-by: Colin Walters <walters@verbum.org>
OCI container layer tars often don't include a root directory entry, and when they do, container runtimes (Podman, Docker) ignore it and use hardcoded defaults. We had some attempt to handle this with our "stat root" stuff but that only handled mtime - i.e. not permissions or other metadata.
Since the contents of
/are ill defined, let's just default to defining them based on/usrsince I am not aware of anyone who cares for those to be different (except for the selinux label but we handle that already).Add new APIs to handle this:
copy_root_metadata_from_usr(): copies mode, uid, gid, mtime, xattrs from /usr to root directory for deterministic metadata
read_container_root(): convenience wrapper that calls read_filesystem() then copy_root_metadata_from_usr()
Remove the incomplete have_root_stat/ensure_root_stat() mechanism which only handled mtime. Callers should now explicitly use the new helpers.
Update cfsctl commands (compute-id, create-image, create-dumpfile) to use read_container_root() by default. Add --no-propagate-usr-to-root flag to use raw read_filesystem() for testing.
Assisted-by: OpenCode (Opus 4.5)