Add cron schedule to mark activities as finished#31737

sgress454 · 2025-08-08T15:42:49Z

Details

This PR adds a new cron schedule "batch_activity_completion_checker" that runs every 5 minutes and checks whether any batch activities marked as "started" have completed their runs. In general this is done by determining whether the sum of the "ran", "incompatible", "errored" and "canceled" hosts equals the number of "targeted" hosts for the activity. How that is computed will vary by batch activity type (currently we just have batch scripts).

When an activity is marked as finished, we cache the final tally of host statuses (ran, incompatible, errored, canceled) on the record. This is important so that future queries on activity records don't have to do the expensive query to compute the host counts on activities where those counts will never change.

Checklist for submitter

If some of the following don't apply, delete the relevant line.

Input data is properly validated, SELECT * is avoided, SQL injection is prevented (using placeholders for values in statements)

Testing

Added/updated automated tests
Where appropriate, automated tests simulate multiple hosts and test for host isolation (updates to one hosts's records do not affect another)
QA'd all new/changed functionality manually
Started a new batch script run using the update run modal (see Update run script modal #31604) and then triggered the new job using fleetctl trigger --name batch_activity_completion_checker, and verified that the batch_activities record status was finished and the expected fields were populated.

Summary by CodeRabbit

New Features
- Introduced an automated process that regularly marks completed batch activities, ensuring more accurate and up-to-date activity statuses.
Bug Fixes
- Improved reliability in updating the status of batch activities when all targeted hosts have finished their tasks.
Tests
- Added comprehensive tests to verify correct marking of completed batch activities.
Chores
- Enhanced internal scheduling and datastore interfaces to support the new completion-checking process.

…1555-cron-branched-off-batch-status

sgress454 · 2025-08-08T15:43:20Z

@coderabbitai review

coderabbitai · 2025-08-08T15:43:26Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2025-08-08T15:48:52Z

Walkthrough

A new cron job was introduced to periodically mark batch activities as completed in the system. This includes adding the job and its schedule, updating the datastore interface and implementation, integrating the schedule into server startup, and adding tests and mocks for the new datastore method.

Changes

Cohort / File(s)	Change Summary
Cron Job Scheduling and Execution `cmd/fleet/cron.go`, `cmd/fleet/serve.go`, `server/fleet/cron_schedules.go`	Added a new cron job and schedule (`batch_activity_completion_checker`) that runs every 5 minutes, calling a datastore method to mark batch activities as completed. Integrated the schedule into the server startup. Registered the new schedule name constant.
Datastore Interface and Implementation `server/fleet/datastore.go`, `server/datastore/mysql/scripts.go`	Added `MarkActivitiesAsCompleted` method to the datastore interface and implemented it in the MySQL datastore to update batch activity statuses and counts.
Testing `server/datastore/mysql/scripts_test.go`	Introduced a test for the new `MarkActivitiesAsCompleted` method, verifying correct status and count updates for batch activities.
Mocking `server/mock/datastore_mock.go`	Added mock implementation for the `MarkActivitiesAsCompleted` method in the datastore mock for testing purposes.

Sequence Diagram(s)

sequenceDiagram
    participant CronScheduler
    participant FleetServer
    participant Datastore

    CronScheduler->>FleetServer: Trigger batch_activity_completion_checker (every 5 min)
    FleetServer->>Datastore: MarkActivitiesAsCompleted(ctx)
    Datastore-->>FleetServer: error or success
    FleetServer-->>CronScheduler: Job complete

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15–20 minutes

Possibly related issues

Create cron schedule to mark batch activities as completed #31555: Implements the exact cron schedule and datastore method for marking batch activities as completed.
Scheduled scripts backend: Add BatchActivitiesSchedule cron schedule #31521: Related as it deals with batch activity cron schedules, but focuses on starting rather than completing activities.

Note

🔌 MCP (Model Context Protocol) integration is now available in Early Access!

Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch sgress454/31555-cron-branched-off-batch-status

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (7)

cmd/fleet/serve.go (1)
1048-1053: Nit: use the schedule constant in the error message to avoid drift

Minor consistency improvement; several nearby registrations reference the schedule constant in error messages.
-            if err := cronSchedules.StartCronSchedule(func() (fleet.CronSchedule, error) {
+            if err := cronSchedules.StartCronSchedule(func() (fleet.CronSchedule, error) {
                 return newBatchActivityCompletionCheckerSchedule(ctx, instanceID, ds, logger)
             }); err != nil {
-                initFatal(err, "failed to register batch activity completion checker schedule")
+                initFatal(err, fmt.Sprintf("failed to register %s schedule", fleet.CronBatchActivityCompletionChecker))
             }
server/datastore/mysql/scripts.go (1)
2231-2236: Use the retry-aware helper instead of plain withTx.

Nearly all write paths in this datastore layer wrap the transaction with withRetryTxx to transparently retry on dead-locks/lock-wait-timeouts.
For consistency – and to avoid a stuck cron on busy installations – prefer the same helper here.
- if err := ds.withTx(ctx, func(tx sqlx.ExtContext) error {
+ if err := ds.withRetryTxx(ctx, func(tx sqlx.ExtContext) error {
cmd/fleet/cron.go (3)
1517-1527: Pass logger into the job for in-job observability.

The cron function currently can't log timings or outcomes. Pass the schedule logger into the job and through to the handler (see next comment for handler changes).

Apply within this range:
-            func(ctx context.Context) error {
-                return cronBatchActivityCompletionChecker(ctx, ds)
-            },
+            func(ctx context.Context) error {
+                return cronBatchActivityCompletionChecker(ctx, ds, logger)
+            },
1531-1540: Add minimal instrumentation and accept logger.

Capture elapsed time and log at debug-level to aid troubleshooting without noisy info-level logs. Imports for time and kitlog/level are already present.
-func cronBatchActivityCompletionChecker(
-    ctx context.Context,
-    ds fleet.Datastore,
-) error {
-    if err := ds.MarkActivitiesAsCompleted(ctx); err != nil {
-        return ctxerr.Wrap(ctx, err, "mark batch activities as completed")
-    }
-    // TODO -- add an entry in the global feed for each completed activity?
-    return nil
-}
+func cronBatchActivityCompletionChecker(
+    ctx context.Context,
+    ds fleet.Datastore,
+    logger kitlog.Logger,
+) error {
+    start := time.Now()
+    if err := ds.MarkActivitiesAsCompleted(ctx); err != nil {
+        return ctxerr.Wrap(ctx, err, "mark batch activities as completed")
+    }
+    level.Debug(logger).Log("msg", "batch activity completion check run completed", "took", time.Since(start).String())
+    // TODO -- add an entry in the global feed for each completed activity?
+    return nil
+}
1538-1539: Track the TODO for global feed entries.

If you plan to emit a feed entry per completed activity, please open a tracking issue to define scope and guardrails (e.g., deduplication/noise limits, correlation with existing “run started” activities, and backfill behavior).

Happy to draft the issue or propose an implementation sketch gated by config.
server/datastore/mysql/scripts_test.go (2)
2253-2262: Avoid ordering assumptions that can make the test flaky

You schedule a second batch before selecting host1Upcoming/host2Upcoming/host3Upcoming, then use index 0 to submit results/cancel. This assumes listUpcomingHostScriptExecutions returns the first execution's rows first. If ordering changes, you could unintentionally resolve execID2 and leave execID pending, causing the test to fail.

Safer approach: capture host upcoming execution IDs immediately after the first BatchExecuteScript and before scheduling the second, then use those IDs when submitting results/cancel.

Alternatively, disambiguate the target batch when selecting which upcoming to resolve (if the item includes batch execution identifiers).

Would you like me to propose a patch that moves the listUpcoming calls before scheduling the second batch?

2294-2304: Tighten assertions: verify finished invariant or pending count

To better validate the “finished” condition and cached tallies:

Option A: assert that targeted == ran + errored + incompatible + canceled.

Option B: if NumPending is set by MarkActivitiesAsCompleted, assert it's 0.

Example:
require.Equal(t,
  *batchActivity.NumTargeted,
  *batchActivity.NumRan+*batchActivity.NumErrored+*batchActivity.NumIncompatible+*batchActivity.NumCanceled,
)
Use Option B only if NumPending is guaranteed to be non-nil for finished activities.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f53e5a and 9a62564.

📒 Files selected for processing (7)

cmd/fleet/cron.go (1 hunks)
cmd/fleet/serve.go (1 hunks)
server/datastore/mysql/scripts.go (1 hunks)
server/datastore/mysql/scripts_test.go (2 hunks)
server/fleet/cron_schedules.go (1 hunks)
server/fleet/datastore.go (1 hunks)
server/mock/datastore_mock.go (3 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.go

⚙️ CodeRabbit Configuration File

When reviewing SQL queries that are added or modified, ensure that appropriate filtering criteria are applied—especially when a query is intended to return data for a specific entity (e.g., a single host). Check for missing WHERE clauses or incorrect filtering that could lead to incorrect or non-deterministic results (e.g., returning the first row instead of the correct one). Flag any queries that may return unintended results due to lack of precise scoping.

Files:

server/fleet/cron_schedules.go
cmd/fleet/cron.go
server/datastore/mysql/scripts.go
server/fleet/datastore.go
server/datastore/mysql/scripts_test.go
cmd/fleet/serve.go
server/mock/datastore_mock.go

🧠 Learnings (1)

📚 Learning: 2025-08-08T07:40:05.274Z

Learnt from: getvictor
PR: fleetdm/fleet#31726
File: server/datastore/mysql/labels_test.go:2031-2031
Timestamp: 2025-08-08T07:40:05.274Z
Learning: In fleetdm/fleet repository tests (server/datastore/mysql/labels_test.go and similar), using testing.T.Context() is valid because the project targets a recent Go version where testing.T.Context() exists. Do not suggest replacing t.Context() with context.Background() in this codebase.

Applied to files:

server/datastore/mysql/scripts_test.go

🔇 Additional comments (6)

server/fleet/cron_schedules.go (1)

34-38: New schedule name constant — LGTM

Name is clear and consistent with existing schedule naming. No issues.

cmd/fleet/serve.go (1)

1048-1053: Registering the batch activity completion checker — LGTM

Registration timing and dependencies (ctx, instanceID, ds, logger) look correct.

cmd/fleet/cron.go (1)

1514-1516: LGTM: Sensible cron name and interval.

Name aligns with existing conventions and a 5-minute cadence is reasonable for eventual consistency on batch activity completion.

server/mock/datastore_mock.go (2)

1180-1181: New mock func type matches interface — LGTM

Signature aligns with MarkActivitiesAsCompleted(ctx context.Context) error. No concerns.

3191-3193: DataStore fields for MarkActivitiesAsCompleted — LGTM

Adding the func and the Invoked flag matches existing mock patterns and will help assertions in tests.

server/datastore/mysql/scripts_test.go (1)

49-49: Add test to suite: looks good

Registering TestMarkActivitiesAsCompleted in the scripts test table is appropriate and keeps coverage centralized.

codecov · 2025-08-08T15:55:22Z

Codecov Report

❌ Patch coverage is 56.92308% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.76%. Comparing base (8e417fe) to head (94f897c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
cmd/fleet/cron.go	0.00%	20 Missing ⚠️
cmd/fleet/serve.go	0.00%	5 Missing ⚠️
server/datastore/mysql/scripts.go	92.50%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #31737   +/-   ##
=======================================
  Coverage   63.76%   63.76%           
=======================================
  Files        1963     1963           
  Lines      192009   192015    +6     
  Branches     6364     6327   -37     
=======================================
+ Hits       122440   122445    +5     
- Misses      59996    59999    +3     
+ Partials     9573     9571    -2

Flag	Coverage Δ
backend	`65.15% <56.92%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

dantecatalfamo · 2025-08-08T17:14:47Z

@sgress454 I'm going to need to reuse that completion tally logic for the job canceling method as well. Do you think you could split it out into its own function that takes a transaction or something like that?

dantecatalfamo · 2025-08-08T17:15:09Z

Or that takes a where clause, either way

The base branch was changed.

sgress454 added 30 commits August 6, 2025 11:04

add batch status endpoint

8019e1e

simplify

d7439d2

pending -> scheduled

deb9568

add list endpoint

d9b91f4

rename

a60109b

add count

1e779ca

clean up

1426ef9

finish list endpoint

4dec994

remove log

80c7684

added changelog

a05b82c

add execution id to response

66ab5a8

simplify

f3fe8a2

comments

153cfce

update comment

913fa51

lint

d6c6c63

BatchExecutionSummary -> BatchActivity

d1e6a79

comments

d4ad643

add extra fields

4700017

add started_at to activity type

c2a33f9

Merge branch 'main' into sgress454/31623-batch-status-endpoint

eb1b8a1

Merge branch 'main' into sgress454/31623-batch-status-endpoint

9a0b14d

update

41ce1b2

implement shortcut for getting finished activity status

45f295a

summary

6eb0ce1

update test

a7140c1

comment

83a77b7

use union

76c07d7

move code to improve diff

5958929

Merge branch 'main' into sgress454/31623-batch-status-endpoint

97f5f0c

add cron job to mark activities as completed

de443c0

sgress454 added 2 commits August 8, 2025 10:34

Merge branch 'main' into sgress454/31623-batch-status-endpoint

3f53e5a

Merge branch 'sgress454/31623-batch-status-endpoint' into sgress454/3…

9a62564

…1555-cron-branched-off-batch-status

sgress454 requested a review from a team as a code owner August 8, 2025 15:42

sgress454 temporarily deployed to Docker Hub August 8, 2025 15:42 — with GitHub Actions Inactive

coderabbitai Bot reviewed Aug 8, 2025

View reviewed changes

Comment thread server/datastore/mysql/scripts_test.go Outdated

Comment thread server/datastore/mysql/scripts_test.go Outdated

Comment thread server/datastore/mysql/scripts.go

Comment thread server/fleet/datastore.go

Comment thread server/mock/datastore_mock.go

Update server/datastore/mysql/scripts_test.go

e7f81a1

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

sgress454 had a problem deploying to Docker Hub August 8, 2025 16:13 — with GitHub Actions Error

sgress454 temporarily deployed to Docker Hub August 8, 2025 16:13 — with GitHub Actions Inactive

sgress454 added 2 commits August 8, 2025 11:18

updates

430011d

update

7c9e344

sgress454 temporarily deployed to Docker Hub August 8, 2025 16:18 — with GitHub Actions Inactive

sgress454 assigned dantecatalfamo Aug 8, 2025

dantecatalfamo previously approved these changes Aug 8, 2025

View reviewed changes

dantecatalfamo reviewed Aug 8, 2025

View reviewed changes

Comment thread server/datastore/mysql/scripts.go

Base automatically changed from sgress454/31623-batch-status-endpoint to main August 8, 2025 18:24

Merge branch 'main' into sgress454/31555-cron-branched-off-batch-status

94f897c

sgress454 temporarily deployed to Docker Hub August 8, 2025 18:43 — with GitHub Actions Inactive

dantecatalfamo approved these changes Aug 8, 2025

View reviewed changes

sgress454 merged commit bba0c8a into main Aug 8, 2025
40 checks passed

sgress454 deleted the sgress454/31555-cron-branched-off-batch-status branch August 8, 2025 19:02

Conversation

sgress454 commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Checklist for submitter

Testing

Summary by CodeRabbit

Uh oh!

sgress454 commented Aug 8, 2025

Uh oh!

coderabbitai Bot commented Aug 8, 2025

Uh oh!

coderabbitai Bot commented Aug 8, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dantecatalfamo commented Aug 8, 2025

Uh oh!

dantecatalfamo commented Aug 8, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sgress454 commented Aug 8, 2025 •

edited

Loading

codecov Bot commented Aug 8, 2025 •

edited

Loading