Conversation
3f7a39b to
37447a2
Compare
37447a2 to
c98a999
Compare
|
@Ronitsabhaya75 Retrying around undiagnosed problems is not the answer, and in this case you can retry a million times and it still won't work. |
let me see what can potential issue for installing-kernel can be. thats where CI has been failing all time. |
|
I am committing a likely fix. The problem has nothing to do with installing the kernel. |
yeah time out issue (wording mistake) |
16eb03e to
7534596
Compare
|
@jglogan do you think this is good idea adding endpoint that captures the system state like running containers, then calling it automatically before operation when CI is failing ( like install kernel stage). this can give us snapshot of before and after for comparision. |
|
Hey @jglogan, I was thinking about this last night after our conversation. You're right that retries won't fix this but I think I might have an idea about what's actually going wrong. I'm wondering if the install-kernel timeout is actually a race condition. Like, we're trying to install the kernel before the API server is fully ready to accept XPC connections. The service might be "started" but not fully initialized yet. Instead of retrying or just increasing the timeout, what if we add a proper readiness check? And actually, your new JSON status output makes this really easy to implement: install-kernel: start-system
@echo "Waiting for system to be ready..."
@for i in 1 2 3 4 5; do \
if $(CONTAINER) system status --output json | jq -e '.apiServerRunning == true' > /dev/null 2>&1; then \
echo "API server ready"; \
break; \
fi; \
echo "Waiting for API server (attempt $$i/5)..."; \
sleep 2; \
done
$(CONTAINER) kernel installsample output $ make install-kernel
Waiting for system to be ready...
Waiting for API server (attempt 1/5)...
Waiting for API server (attempt 2/5)...
Waiting for API server (attempt 3/5)...
Waiting for API server (attempt 4/5)...
Waiting for API server (attempt 5/5)...
Warning: API server did not become ready, attempting install anyway...
Installing kernel extension...
Error: Failed to connect to API server (timeout) |
cf477f4 to
e661893
Compare
Perhaps. But the API server accepted an XPC connection at |
c7d5c6b to
c75e290
Compare
| let log = RuntimeLinuxHelper.setupLogger(debug: debug, metadata: ["uuid": "\(uuid)"]) | ||
| let commandName = RuntimeLinuxHelper._commandName | ||
| let logPath = logRoot.map { $0.appending("\(commandName)-\(uuid).log") } | ||
| let log = ServiceLogger.bootstrap(category: "NetworkVmnetHelper", metadata: ["uuid": "\(uuid)"], debug: debug, logPath: logPath) |
There was a problem hiding this comment.
Should be RuntimeLinuxHelper.
f538082 to
4d3481e
Compare
| ) -> Logger { | ||
| LoggingSystem.bootstrap { label in | ||
| if let logPath { | ||
| if let handler = try? FileLogHandler(label: label, path: logPath) { |
There was a problem hiding this comment.
Will the user be aware of a fallback to OS logs here in case of failure?
There was a problem hiding this comment.
Logging isn't set up and this is in a service context, so I don't see a good way to communicate it.
We could try to make note of the condition here and then log a warning above L46, but that would still go into the OS log; it wouldn't be immediately visible to the user.
There was a problem hiding this comment.
Makes sense. I'm not seeing a good way to go about it either. Not blocking for this PR, but maybe something to think about
d7e3a2b to
b86cda2
Compare
eb40b71 to
7efd570
Compare
7efd570 to
daf43dc
Compare
ajemory
left a comment
There was a problem hiding this comment.
Looks good. Always happy to see well-thought-out logging enhancements 😄
Type of Change
Motivation and Context
--log-rootoption toswift system start, propagating the value asCONTAINER_LOG_ROOTto services for logging to files instead of the OS log facility. This is not a "production" capability as it neither merges nor rotates logs.logcommand there. The PR adds--log-rootto the CI test phase, archives the results, and uploads the archive as an artifact.grep -r 'log\.' Sources.--log-root.Testing