sysext: port AWS OEM to systemd sysext image#1083
Conversation
|
Build action triggered: https://github.com/flatcar/scripts/actions/runs/6297834375 |
| cat > "${rootfs}/usr/lib/systemd/system/setup-oem.service" <<-'EOF' | ||
| [Unit] | ||
| Description=Setup OEM | ||
|
|
There was a problem hiding this comment.
Should this run before amazon-ssm-agent.service?
Also, would symlinks work, too?
There was a problem hiding this comment.
I'm curious: what would be the benefit of using symlink here? If the user wants to edit /etc/amazon/ssm/amazon-ssm-agent.json for example, he won't be able to do it as /usr/... is read-only.
There was a problem hiding this comment.
With cp the user edits also get lost. With a symlink we could check whether a custom target is set and then don't touch it and document how that works to opt-out of auto-updates for that file.
|
|
| EOF | ||
|
|
||
| mkdir -p "${rootfs}/usr/lib/systemd/system/multi-user.target.d" | ||
| { echo "[Unit]"; echo "Upholds=amazon-ssm-agent.service coreos-metadata-sshkeys@.service setup-oem.service"; } > "${rootfs}/usr/lib/systemd/system/multi-user.target.d/10-oem-ami.conf" |
There was a problem hiding this comment.
Enabling coreos-metadata-sshkeys@.service is something we should do in the base image.
| [Service] | ||
| Type=oneshot | ||
| RemainAfterExit=yes | ||
| ExecStartPre=/usr/bin/cp /usr/share/amazon/ssm/amazon-ssm-agent.json /etc/amazon/ssm/amazon-ssm-agent.json.template |
There was a problem hiding this comment.
If that file is only used by the service unit we could also use BindPaths= to provide it under /etc. @krnowak that could also be an option for the waagent, or?
| /etc/eks/bootstrap.sh | ||
| ) | ||
|
|
||
| rm -rf "${to_delete[@]/#/${rootfs}}" |
There was a problem hiding this comment.
I don't understand this line, what creates the files under /etc?
There was a problem hiding this comment.
Sorry, I've been confused about the manglefs script - to me it was running on the host so I wanted to clean up the OEM old files from there.
EDIT: Ok, it's done there flatcar/update_engine#24 (comment)
There was a problem hiding this comment.
The list of old OEM files should now go to the misc-files package: #1016
We should boot an instance and check the old contents of /oem/ (the list of files for /etc looks good and can be easily seen in the base.ign).
There was a problem hiding this comment.
So, the above can be deleted here, or?
| mkdir -p "${rootfs}/usr/lib/systemd/system/amazon-ssm-agent.service.d" | ||
| cat > "${rootfs}/usr/lib/systemd/system/amazon-ssm-agent.service.d/10-bindpaths.conf" <<-'EOF' | ||
| [Service] | ||
| BindPaths=/usr/share/amazon/ssm/:/etc/amazon/ssm/ /usr/share/amazon/eks/boostrap.sh:/etc/eks/bootstrap.sh |
There was a problem hiding this comment.
Are users expected to be able run the CLI themselves and does it also read from /etc? (In that case we would anyway have to have the symlinks from /etc, or?)
There was a problem hiding this comment.
Actually from my understanding, the bootstrap script is executed by user-data (https://kinvolk.io/blog/2021/02/deploying-an-eks-cluster-with-flatcar-workers/) - so it does not even need to be shared
There was a problem hiding this comment.
My question was more about ssm-cli and whether this is used by users and needs access to the files in /etc.
There was a problem hiding this comment.
It seems that ssm-cli is consuming directly the amazon-ssm-agent but I think it might be wiser to copy directly the files in /etc rather that bind mount.
| EOF | ||
|
|
||
| mkdir -p "${rootfs}/usr/lib/systemd/system/multi-user.target.d" | ||
| { echo "[Unit]"; echo "Upholds=amazon-ssm-agent.service coreos-metadata-sshkeys@core.service"; } > "${rootfs}/usr/lib/systemd/system/multi-user.target.d/10-oem-ec2.conf" |
There was a problem hiding this comment.
Can we do the starting of coreos-metadata-sshkeys@core.service in the base image - we also do the coreos-cloudinit start in the base image and could do it similarly (have a unit that has a condition for the the OEM kernel cmdline argument and then uses Upholds= to start it).
There was a problem hiding this comment.
Something like this
[Unit]
ConditionKernelCommandLine=|ignition.platform.id=packet
ConditionKernelCommandLine=|flatcar.oem.id=packet
ConditionKernelCommandLine=|coreos.oem.id=packet
ConditionKernelCommandLine=|ignition.platform.id=ec2
ConditionKernelCommandLine=|flatcar.oem.id=ec2
ConditionKernelCommandLine=|coreos.oem.id=ec2
ConditionKernelCommandLine=|ignition.platform.id=digitalocean
ConditionKernelCommandLine=|flatcar.oem.id=digitalocean
ConditionKernelCommandLine=|coreos.oem.id=digitalocean
ConditionKernelCommandLine=|ignition.platform.id=gce
ConditionKernelCommandLine=|flatcar.oem.id=gce
ConditionKernelCommandLine=|coreos.oem.id=gce
Upholds=coreos-metadata-sshkeys@core.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/true
[Install]
WantedBy=multi-user.target
There was a problem hiding this comment.
| { echo "[Unit]"; echo "Upholds=amazon-ssm-agent.service coreos-metadata-sshkeys@core.service"; } > "${rootfs}/usr/lib/systemd/system/multi-user.target.d/10-oem-ec2.conf" | |
| { echo "[Unit]"; echo "Upholds=amazon-ssm-agent.service"; } > "${rootfs}/usr/lib/systemd/system/multi-user.target.d/10-oem-ec2.conf" |
There was a problem hiding this comment.
Interesting - the Upholds units are executed unconditionally:
$ systemctl status sshkeys@core.service
○ sshkeys@core.service
Loaded: loaded (/usr/lib/systemd/system/sshkeys@.service; static)
Active: inactive (dead)
Condition: start condition failed at Fri 2023-09-08 12:14:19 UTC; 17min ago
Sep 08 12:14:19 localhost systemd[1]: sshkeys@core.service was skipped because no trigger condition checks were met.
$ systemctl status coreos-metadata-sshkeys@core.service
● coreos-metadata-sshkeys@core.service - Flatcar Metadata Agent (SSH Keys)
Loaded: loaded (/usr/lib/systemd/system/coreos-metadata-sshkeys@.service; disabled; preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Fri 2023-09-08 12:34:50 UTC; 7s ago
Process: 1799 ExecStart=/usr/bin/coreos-metadata ${COREOS_METADATA_OPT_PROVIDER} --ssh-keys=core (code=exited, status=1/FAILURE)
Main PID: 1799 (code=exited, status=1/FAILURE)
CPU: 11ms
-> whole qemu test suite is failing. I guess we can go back to ExecStart=systemctl start coreos-metadata-sshkeys@core.service
There was a problem hiding this comment.
What are the contents of /usr/lib/systemd/system/multi-user.target.d/10-oem-ec2.conf? Is this really the most recent state?
There was a problem hiding this comment.
Ah, understood, so it seems the condition is only for the [Service] section? And Upholds= still gets used all the time. Then yes, ExecStart=systemctl start coreos-metadata-sshkeys@core.service instead of ExecStart=/bin/true sounds good!
There was a problem hiding this comment.
What are the contents of /usr/lib/systemd/system/multi-user.target.d/10-oem-ec2.conf? Is this really the most recent state?
It's for qemu so the sysext image is not even present.
so it seems the condition is only for the [Service] section?
That's my conclusion too.
There was a problem hiding this comment.
Ok, so more follow-up for the init PR… Sorry for the misleading suggestion
There was a problem hiding this comment.
No worries - I fixed this right after: flatcar/init#105 now the CI is 🟢
9016250 to
8841684
Compare
| ExecStartPre=/usr/bin/ln --symbolic /usr/share/amazon/ssm/amazon-ssm-agent.json.template /etc/amazon/ssm/amazon-ssm-agent.json | ||
| ExecStartPre=/usr/bin/ln --symbolic /usr/share/amazon/ssm/seelog.xml.template /etc/amazon/ssm/seelog.xml | ||
| ExecStart=/usr/bin/ln --symbolic /usr/share/amazon/eks/bootstrap.sh /etc/eks/bootstrap.sh |
There was a problem hiding this comment.
When the link already exists this will fail, do you want it to be skipped then? This would be possible with ExecStartPre=-.
There was a problem hiding this comment.
I think we should talk about this: how we manage the update of /etc. While redoing this section, I was thinking about using cp --backup to a) update the /etc/ files and b) keep any previous configuration.
There was a problem hiding this comment.
Since these files are likely touched by the user I would exclude them from the migration step. This means they will be there as regular files and not updated unless we could identify them as untouched with a checksum and create the symlink. For new instances the symlink is the default and we don't need to do a update logic and this is covered by the sysext content. The user could still overwrite the symlink or replace it with a file if we use ExecStartPre=-.
8841684 to
957cd43
Compare
The oem release ID would only be used for the update payload name and the migration file. No translation to |
For this one yes, no translation but we still need one for the kernel command line parameter (see: 9df7e19). Otherwise |
| - AWS OEM images now use a systemd-sysext image for layering additional platform-specific software on top of `/usr`. The OEM software is still not updated but this will be added soon. | ||
| - The AWS OEM ID kernel command line parameter changed to `ami` |
There was a problem hiding this comment.
| - AWS OEM images now use a systemd-sysext image for layering additional platform-specific software on top of `/usr`. The OEM software is still not updated but this will be added soon. | |
| - The AWS OEM ID kernel command line parameter changed to `ami` | |
| - AWS OEM images now use a systemd-sysext image for layering additional platform-specific software on top of `/usr` |
| @@ -0,0 +1,2 @@ | |||
| [Unit] | |||
| Upholds=amazon-ssm-agent.service | |||
There was a problem hiding this comment.
| Upholds=amazon-ssm-agent.service | |
| Upholds=amazon-ssm-agent.service setup-oem.service |
|
|
||
| src_install() { | ||
| systemd_dounit "${FILESDIR}/setup-oem.service" | ||
| systemd_install_serviced "${FILESDIR}/10-oem-ami.conf" multi-user.target |
There was a problem hiding this comment.
Is this line correct? I don't see the service running
There was a problem hiding this comment.
Even after starting manually I think it has some problems:
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:41 WARN [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:41 INFO [ssm-agent-worker] Checking if agent identity type CustomIdentity can be assumed
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:41 WARN [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:41 ERROR [ssm-agent-worker] Agent failed to assume any identity
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:41 ERROR [ssm-agent-worker] failed to find identity, retrying: failed to find agent identity
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 INFO [ssm-agent-worker] Checking if agent identity type OnPrem can be assumed
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 WARN [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 INFO [ssm-agent-worker] Checking if agent identity type EC2 can be assumed
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 WARN [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 INFO [ssm-agent-worker] Checking if agent identity type CustomIdentity can be assumed
Sep 25 13:01:42 ip-172-31-31-92.us-west-2.compute.internal amazon-ssm-agent[2368]: 2023-09-25 13:01:42 WARN [ssm-agent-worker] failed to read runtime config 'identity_config.json': open /var/lib/amazon/ssm/runtimeconfig/identity_config.json: no such file or directory
There was a problem hiding this comment.
If the termination also happens on the latest Alpha it can be ignored
There was a problem hiding this comment.
Is this line correct? I don't see the service running
The file is actually installed in /etc/systemd/system/multi-user.target.d/ so it's not packaged.
ec04512 to
57718ba
Compare
57718ba to
d703e00
Compare
...tainer/src/third_party/coreos-overlay/coreos-base/common-oem-files/common-oem-files-0.ebuild
Outdated
Show resolved
Hide resolved
sdk_container/src/third_party/coreos-overlay/coreos-base/oem-ami/oem-ami-3.2.985.0-r1.ebuild
Outdated
Show resolved
Hide resolved
sdk_container/src/third_party/coreos-overlay/coreos-base/misc-files/files/oems/ami
Show resolved
Hide resolved
krnowak
left a comment
There was a problem hiding this comment.
Just remembered about revision bumping in overlay.
...tainer/src/third_party/coreos-overlay/coreos-base/common-oem-files/common-oem-files-0.ebuild
Show resolved
Hide resolved
...container/src/third_party/coreos-overlay/coreos-base/flatcar-eks/flatcar-eks-0.0.1-r1.ebuild
Show resolved
Hide resolved
sdk_container/src/third_party/coreos-overlay/coreos-base/misc-files/files/oems/ami
Show resolved
Hide resolved
...ner/src/third_party/coreos-overlay/coreos-base/oem-ec2-compat/oem-ec2-compat-0.1.2-r3.ebuild
Show resolved
Hide resolved
sdk_container/src/third_party/coreos-overlay/coreos-base/oem-ami/oem-ami-3.2.985.0-r1.ebuild
Show resolved
Hide resolved
- drop the OEM mention - install things under /usr/share/amazon/ssm - add systemd unit from the upstream Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
while this ebuild will be dropped in the near future, we still need to maintain openstack ebuild. `flatcar-eks` was a runtime dependency of openstack/brightbox too. I think it was a mistake ? Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
found by booting stable on AWS: `find /usr/share/oem` + checking the content of files created by base Ignition. Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
d703e00 to
678c8fc
Compare
...ner/src/third_party/coreos-overlay/coreos-base/common-oem-files/common-oem-files-0-r1.ebuild
Outdated
Show resolved
Hide resolved
For this vendor, the OEM ID from the oem-release file is different from the oem.id kernel commandline parameter. Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
678c8fc to
bfaea38
Compare
krnowak
left a comment
There was a problem hiding this comment.
Cool, looks good from my side.
In this PR, we port the current AWS OEM to a systemd system extension (
sysext) image. It allows us to not rely on thebase-ec2.ignconfiguration file and to remove specific OEM bits from the two related ebuilds:flatcar-eksandamazon-ssm-agent.Testing done
changelog/directory (user-facing change, bug fix, security fix, update)/bootand/usrsize, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.related to: flatcar/Flatcar#1145