Skip to content

Conversation

@marquiz
Copy link
Contributor

@marquiz marquiz commented Jul 31, 2025

Error out --l3-cache-schema and --mem-bw-schema if the original spec didn't specify intelRdt which also means that no CLOS (resctrl group) was created for the container.

This prevents serious issues in this corner case.

First, a CLOS was created but the schemata of the CLOS was not correctly updated. Confusingly, calling runc update twice did the job: the first call created the resctrl group and the seccond call was able to update the schemata. This issue would be relatively easily fixable, though.

Second, more severe issue is that creating new CLOSes this way caused them to be orphaned, not being removed when the container exists. This is caused by runc not capturing the updated state (original spec was intelRdt=nil -> no CLOS but after update this is not the case).

The most severe problem is that the update only move (or tried to move) the original init process pid but all children escaped the update. Doing this (i.e. migrating all processes of a container from CLOS to another CLOS) reliably, race-free, would probably require freezing the container.

@marquiz
Copy link
Contributor Author

marquiz commented Aug 1, 2025

ping @kolyshkin @AkihiroSuda

@marquiz marquiz force-pushed the devel/runc-update-rdt-empty-conf branch from fb6195f to 2fd2204 Compare August 1, 2025 11:36
Error out --l3-cache-schema and --mem-bw-schema if the original
spec didn't specify intelRdt which also means that no CLOS (resctrl
group) was created for the container.

This prevents serious issues in this corner case.

First, a CLOS was created but the schemata of the CLOS was not
correctly updated. Confusingly, calling runc update twice
did the job: the first call created the resctrl group and the seccond
call was able to update the schemata. This issue would be relatively
easily fixable, though.

Second, more severe issue is that creating new CLOSes this way caused
them to be orphaned, not being removed when the container exists. This
is caused by runc not capturing the updated state (original spec was
intelRdt=nil -> no CLOS but after update this is not the case).

The most severe problem is that the update only move (or tried to move)
the original init process pid but all children escaped the update. Doing
this (i.e. migrating all processes of a container from CLOS to another
CLOS) reliably, race-free, would probably require freezing the
container.

Signed-off-by: Markus Lehtonen <[email protected]>
@marquiz marquiz force-pushed the devel/runc-update-rdt-empty-conf branch from 2fd2204 to 85801e8 Compare August 1, 2025 11:37
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

I think we need to backport it, too.

@kolyshkin kolyshkin added area/intelrdt backport/1.2-todo A PR in main branch which needs to be backported to release-1.2 backport/1.3-todo A PR in main branch which needs to be backported to release-1.3 labels Aug 4, 2025
@kolyshkin
Copy link
Contributor

@marquiz can you create backport PRs to release-1.3 and release-1.2 branches?

@marquiz
Copy link
Contributor Author

marquiz commented Aug 5, 2025

@marquiz can you create backport PRs to release-1.3 and release-1.2 branches?

Will do, thx

EDIT: will do after this is merged

@AkihiroSuda
Copy link
Member

I think we need to backport it, too.

Isn't this a breaking change?

@AkihiroSuda AkihiroSuda merged commit 9902a3d into opencontainers:main Aug 5, 2025
31 checks passed
@marquiz marquiz deleted the devel/runc-update-rdt-empty-conf branch August 5, 2025 06:59
@marquiz
Copy link
Contributor Author

marquiz commented Aug 5, 2025

Isn't this a breaking change?

Strictly, yes. Because in v1.2 and v1.3 these problematic scenarios are allowed.

@kolyshkin
Copy link
Contributor

Isn't this a breaking change?

Strictly, yes. Because in v1.2 and v1.3 these problematic scenarios are allowed.

Ah, in this case maybe we don't want to do backports.

@kolyshkin kolyshkin removed backport/1.2-todo A PR in main branch which needs to be backported to release-1.2 backport/1.3-todo A PR in main branch which needs to be backported to release-1.3 labels Aug 6, 2025
@lifubang
Copy link
Member

lifubang commented Aug 9, 2025

Let's add a comment to CHANGELOG, for example:

Breaking change: The options l3-cache-schema and mem-bw-schema are no longer allowed in runc update if the original container spec did not include intelRdt. This is because the absence of intelRdt means no CLOS (resctrl group) was created for the container. (#4827)

@marquiz
Copy link
Contributor Author

marquiz commented Aug 11, 2025

Let's add a comment to CHANGELOG

Check. See #4848 (@lifubang added you as a co-author)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants