Add cron schedule to mark activities as finished#31737
Conversation
…1555-cron-branched-off-batch-status
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughA new cron job was introduced to periodically mark batch activities as completed in the system. This includes adding the job and its schedule, updating the datastore interface and implementation, integrating the schedule into server startup, and adding tests and mocks for the new datastore method. Changes
Sequence Diagram(s)sequenceDiagram
participant CronScheduler
participant FleetServer
participant Datastore
CronScheduler->>FleetServer: Trigger batch_activity_completion_checker (every 5 min)
FleetServer->>Datastore: MarkActivitiesAsCompleted(ctx)
Datastore-->>FleetServer: error or success
FleetServer-->>CronScheduler: Job complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15–20 minutes Possibly related issues
Note 🔌 MCP (Model Context Protocol) integration is now available in Early Access!Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (7)
cmd/fleet/serve.go (1)
1048-1053: Nit: use the schedule constant in the error message to avoid driftMinor consistency improvement; several nearby registrations reference the schedule constant in error messages.
- if err := cronSchedules.StartCronSchedule(func() (fleet.CronSchedule, error) { + if err := cronSchedules.StartCronSchedule(func() (fleet.CronSchedule, error) { return newBatchActivityCompletionCheckerSchedule(ctx, instanceID, ds, logger) }); err != nil { - initFatal(err, "failed to register batch activity completion checker schedule") + initFatal(err, fmt.Sprintf("failed to register %s schedule", fleet.CronBatchActivityCompletionChecker)) }server/datastore/mysql/scripts.go (1)
2231-2236: Use the retry-aware helper instead of plainwithTx.Nearly all write paths in this datastore layer wrap the transaction with
withRetryTxxto transparently retry on dead-locks/lock-wait-timeouts.
For consistency – and to avoid a stuck cron on busy installations – prefer the same helper here.- if err := ds.withTx(ctx, func(tx sqlx.ExtContext) error { + if err := ds.withRetryTxx(ctx, func(tx sqlx.ExtContext) error {cmd/fleet/cron.go (3)
1517-1527: Pass logger into the job for in-job observability.The cron function currently can't log timings or outcomes. Pass the schedule logger into the job and through to the handler (see next comment for handler changes).
Apply within this range:
- func(ctx context.Context) error { - return cronBatchActivityCompletionChecker(ctx, ds) - }, + func(ctx context.Context) error { + return cronBatchActivityCompletionChecker(ctx, ds, logger) + },
1531-1540: Add minimal instrumentation and accept logger.Capture elapsed time and log at debug-level to aid troubleshooting without noisy info-level logs. Imports for time and kitlog/level are already present.
-func cronBatchActivityCompletionChecker( - ctx context.Context, - ds fleet.Datastore, -) error { - if err := ds.MarkActivitiesAsCompleted(ctx); err != nil { - return ctxerr.Wrap(ctx, err, "mark batch activities as completed") - } - // TODO -- add an entry in the global feed for each completed activity? - return nil -} +func cronBatchActivityCompletionChecker( + ctx context.Context, + ds fleet.Datastore, + logger kitlog.Logger, +) error { + start := time.Now() + if err := ds.MarkActivitiesAsCompleted(ctx); err != nil { + return ctxerr.Wrap(ctx, err, "mark batch activities as completed") + } + level.Debug(logger).Log("msg", "batch activity completion check run completed", "took", time.Since(start).String()) + // TODO -- add an entry in the global feed for each completed activity? + return nil +}
1538-1539: Track the TODO for global feed entries.If you plan to emit a feed entry per completed activity, please open a tracking issue to define scope and guardrails (e.g., deduplication/noise limits, correlation with existing “run started” activities, and backfill behavior).
Happy to draft the issue or propose an implementation sketch gated by config.
server/datastore/mysql/scripts_test.go (2)
2253-2262: Avoid ordering assumptions that can make the test flakyYou schedule a second batch before selecting host1Upcoming/host2Upcoming/host3Upcoming, then use index 0 to submit results/cancel. This assumes listUpcomingHostScriptExecutions returns the first execution's rows first. If ordering changes, you could unintentionally resolve execID2 and leave execID pending, causing the test to fail.
- Safer approach: capture host upcoming execution IDs immediately after the first BatchExecuteScript and before scheduling the second, then use those IDs when submitting results/cancel.
- Alternatively, disambiguate the target batch when selecting which upcoming to resolve (if the item includes batch execution identifiers).
Would you like me to propose a patch that moves the listUpcoming calls before scheduling the second batch?
2294-2304: Tighten assertions: verify finished invariant or pending countTo better validate the “finished” condition and cached tallies:
- Option A: assert that targeted == ran + errored + incompatible + canceled.
- Option B: if NumPending is set by MarkActivitiesAsCompleted, assert it's 0.
Example:
require.Equal(t, *batchActivity.NumTargeted, *batchActivity.NumRan+*batchActivity.NumErrored+*batchActivity.NumIncompatible+*batchActivity.NumCanceled, )Use Option B only if NumPending is guaranteed to be non-nil for finished activities.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
cmd/fleet/cron.go(1 hunks)cmd/fleet/serve.go(1 hunks)server/datastore/mysql/scripts.go(1 hunks)server/datastore/mysql/scripts_test.go(2 hunks)server/fleet/cron_schedules.go(1 hunks)server/fleet/datastore.go(1 hunks)server/mock/datastore_mock.go(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
⚙️ CodeRabbit Configuration File
When reviewing SQL queries that are added or modified, ensure that appropriate filtering criteria are applied—especially when a query is intended to return data for a specific entity (e.g., a single host). Check for missing WHERE clauses or incorrect filtering that could lead to incorrect or non-deterministic results (e.g., returning the first row instead of the correct one). Flag any queries that may return unintended results due to lack of precise scoping.
Files:
server/fleet/cron_schedules.gocmd/fleet/cron.goserver/datastore/mysql/scripts.goserver/fleet/datastore.goserver/datastore/mysql/scripts_test.gocmd/fleet/serve.goserver/mock/datastore_mock.go
🧠 Learnings (1)
📚 Learning: 2025-08-08T07:40:05.274Z
Learnt from: getvictor
PR: fleetdm/fleet#31726
File: server/datastore/mysql/labels_test.go:2031-2031
Timestamp: 2025-08-08T07:40:05.274Z
Learning: In fleetdm/fleet repository tests (server/datastore/mysql/labels_test.go and similar), using testing.T.Context() is valid because the project targets a recent Go version where testing.T.Context() exists. Do not suggest replacing t.Context() with context.Background() in this codebase.
Applied to files:
server/datastore/mysql/scripts_test.go
🔇 Additional comments (6)
server/fleet/cron_schedules.go (1)
34-38: New schedule name constant — LGTMName is clear and consistent with existing schedule naming. No issues.
cmd/fleet/serve.go (1)
1048-1053: Registering the batch activity completion checker — LGTMRegistration timing and dependencies (ctx, instanceID, ds, logger) look correct.
cmd/fleet/cron.go (1)
1514-1516: LGTM: Sensible cron name and interval.Name aligns with existing conventions and a 5-minute cadence is reasonable for eventual consistency on batch activity completion.
server/mock/datastore_mock.go (2)
1180-1181: New mock func type matches interface — LGTMSignature aligns with
MarkActivitiesAsCompleted(ctx context.Context) error. No concerns.
3191-3193: DataStore fields for MarkActivitiesAsCompleted — LGTMAdding the func and the
Invokedflag matches existing mock patterns and will help assertions in tests.server/datastore/mysql/scripts_test.go (1)
49-49: Add test to suite: looks goodRegistering TestMarkActivitiesAsCompleted in the scripts test table is appropriate and keeps coverage centralized.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #31737 +/- ##
=======================================
Coverage 63.76% 63.76%
=======================================
Files 1963 1963
Lines 192009 192015 +6
Branches 6364 6327 -37
=======================================
+ Hits 122440 122445 +5
- Misses 59996 59999 +3
+ Partials 9573 9571 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
@sgress454 I'm going to need to reuse that completion tally logic for the job canceling method as well. Do you think you could split it out into its own function that takes a transaction or something like that? |
|
Or that takes a where clause, either way |
for #31555
Details
This PR adds a new cron schedule "batch_activity_completion_checker" that runs every 5 minutes and checks whether any batch activities marked as "started" have completed their runs. In general this is done by determining whether the sum of the "ran", "incompatible", "errored" and "canceled" hosts equals the number of "targeted" hosts for the activity. How that is computed will vary by batch activity type (currently we just have batch scripts).
When an activity is marked as finished, we cache the final tally of host statuses (ran, incompatible, errored, canceled) on the record. This is important so that future queries on activity records don't have to do the expensive query to compute the host counts on activities where those counts will never change.
Checklist for submitter
If some of the following don't apply, delete the relevant line.
SELECT *is avoided, SQL injection is prevented (using placeholders for values in statements)Testing
Added/updated automated tests
Where appropriate, automated tests simulate multiple hosts and test for host isolation (updates to one hosts's records do not affect another)
QA'd all new/changed functionality manually
Started a new batch script run using the update run modal (see Update run script modal #31604) and then triggered the new job using
fleetctl trigger --name batch_activity_completion_checker, and verified that thebatch_activitiesrecord status wasfinishedand the expected fields were populated.Summary by CodeRabbit