Skip to content

Add cron schedule to mark activities as finished#31737

Merged
sgress454 merged 39 commits into
mainfrom
sgress454/31555-cron-branched-off-batch-status
Aug 8, 2025
Merged

Add cron schedule to mark activities as finished#31737
sgress454 merged 39 commits into
mainfrom
sgress454/31555-cron-branched-off-batch-status

Conversation

@sgress454
Copy link
Copy Markdown
Contributor

@sgress454 sgress454 commented Aug 8, 2025

for #31555

Details

This PR adds a new cron schedule "batch_activity_completion_checker" that runs every 5 minutes and checks whether any batch activities marked as "started" have completed their runs. In general this is done by determining whether the sum of the "ran", "incompatible", "errored" and "canceled" hosts equals the number of "targeted" hosts for the activity. How that is computed will vary by batch activity type (currently we just have batch scripts).

When an activity is marked as finished, we cache the final tally of host statuses (ran, incompatible, errored, canceled) on the record. This is important so that future queries on activity records don't have to do the expensive query to compute the host counts on activities where those counts will never change.

Checklist for submitter

If some of the following don't apply, delete the relevant line.

  • Input data is properly validated, SELECT * is avoided, SQL injection is prevented (using placeholders for values in statements)

Testing

  • Added/updated automated tests

  • Where appropriate, automated tests simulate multiple hosts and test for host isolation (updates to one hosts's records do not affect another)

  • QA'd all new/changed functionality manually
    Started a new batch script run using the update run modal (see Update run script modal #31604) and then triggered the new job using fleetctl trigger --name batch_activity_completion_checker, and verified that the batch_activities record status was finished and the expected fields were populated.

Summary by CodeRabbit

  • New Features
    • Introduced an automated process that regularly marks completed batch activities, ensuring more accurate and up-to-date activity statuses.
  • Bug Fixes
    • Improved reliability in updating the status of batch activities when all targeted hosts have finished their tasks.
  • Tests
    • Added comprehensive tests to verify correct marking of completed batch activities.
  • Chores
    • Enhanced internal scheduling and datastore interfaces to support the new completion-checking process.

@sgress454 sgress454 requested a review from a team as a code owner August 8, 2025 15:42
@sgress454
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Aug 8, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Aug 8, 2025

Walkthrough

A new cron job was introduced to periodically mark batch activities as completed in the system. This includes adding the job and its schedule, updating the datastore interface and implementation, integrating the schedule into server startup, and adding tests and mocks for the new datastore method.

Changes

Cohort / File(s) Change Summary
Cron Job Scheduling and Execution
cmd/fleet/cron.go, cmd/fleet/serve.go, server/fleet/cron_schedules.go
Added a new cron job and schedule (batch_activity_completion_checker) that runs every 5 minutes, calling a datastore method to mark batch activities as completed. Integrated the schedule into the server startup. Registered the new schedule name constant.
Datastore Interface and Implementation
server/fleet/datastore.go, server/datastore/mysql/scripts.go
Added MarkActivitiesAsCompleted method to the datastore interface and implemented it in the MySQL datastore to update batch activity statuses and counts.
Testing
server/datastore/mysql/scripts_test.go
Introduced a test for the new MarkActivitiesAsCompleted method, verifying correct status and count updates for batch activities.
Mocking
server/mock/datastore_mock.go
Added mock implementation for the MarkActivitiesAsCompleted method in the datastore mock for testing purposes.

Sequence Diagram(s)

sequenceDiagram
    participant CronScheduler
    participant FleetServer
    participant Datastore

    CronScheduler->>FleetServer: Trigger batch_activity_completion_checker (every 5 min)
    FleetServer->>Datastore: MarkActivitiesAsCompleted(ctx)
    Datastore-->>FleetServer: error or success
    FleetServer-->>CronScheduler: Job complete
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15–20 minutes

Possibly related issues

Note

🔌 MCP (Model Context Protocol) integration is now available in Early Access!

Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sgress454/31555-cron-branched-off-batch-status

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (7)
cmd/fleet/serve.go (1)

1048-1053: Nit: use the schedule constant in the error message to avoid drift

Minor consistency improvement; several nearby registrations reference the schedule constant in error messages.

-            if err := cronSchedules.StartCronSchedule(func() (fleet.CronSchedule, error) {
+            if err := cronSchedules.StartCronSchedule(func() (fleet.CronSchedule, error) {
                 return newBatchActivityCompletionCheckerSchedule(ctx, instanceID, ds, logger)
             }); err != nil {
-                initFatal(err, "failed to register batch activity completion checker schedule")
+                initFatal(err, fmt.Sprintf("failed to register %s schedule", fleet.CronBatchActivityCompletionChecker))
             }
server/datastore/mysql/scripts.go (1)

2231-2236: Use the retry-aware helper instead of plain withTx.

Nearly all write paths in this datastore layer wrap the transaction with withRetryTxx to transparently retry on dead-locks/lock-wait-timeouts.
For consistency – and to avoid a stuck cron on busy installations – prefer the same helper here.

- if err := ds.withTx(ctx, func(tx sqlx.ExtContext) error {
+ if err := ds.withRetryTxx(ctx, func(tx sqlx.ExtContext) error {
cmd/fleet/cron.go (3)

1517-1527: Pass logger into the job for in-job observability.

The cron function currently can't log timings or outcomes. Pass the schedule logger into the job and through to the handler (see next comment for handler changes).

Apply within this range:

-            func(ctx context.Context) error {
-                return cronBatchActivityCompletionChecker(ctx, ds)
-            },
+            func(ctx context.Context) error {
+                return cronBatchActivityCompletionChecker(ctx, ds, logger)
+            },

1531-1540: Add minimal instrumentation and accept logger.

Capture elapsed time and log at debug-level to aid troubleshooting without noisy info-level logs. Imports for time and kitlog/level are already present.

-func cronBatchActivityCompletionChecker(
-    ctx context.Context,
-    ds fleet.Datastore,
-) error {
-    if err := ds.MarkActivitiesAsCompleted(ctx); err != nil {
-        return ctxerr.Wrap(ctx, err, "mark batch activities as completed")
-    }
-    // TODO -- add an entry in the global feed for each completed activity?
-    return nil
-}
+func cronBatchActivityCompletionChecker(
+    ctx context.Context,
+    ds fleet.Datastore,
+    logger kitlog.Logger,
+) error {
+    start := time.Now()
+    if err := ds.MarkActivitiesAsCompleted(ctx); err != nil {
+        return ctxerr.Wrap(ctx, err, "mark batch activities as completed")
+    }
+    level.Debug(logger).Log("msg", "batch activity completion check run completed", "took", time.Since(start).String())
+    // TODO -- add an entry in the global feed for each completed activity?
+    return nil
+}

1538-1539: Track the TODO for global feed entries.

If you plan to emit a feed entry per completed activity, please open a tracking issue to define scope and guardrails (e.g., deduplication/noise limits, correlation with existing “run started” activities, and backfill behavior).

Happy to draft the issue or propose an implementation sketch gated by config.

server/datastore/mysql/scripts_test.go (2)

2253-2262: Avoid ordering assumptions that can make the test flaky

You schedule a second batch before selecting host1Upcoming/host2Upcoming/host3Upcoming, then use index 0 to submit results/cancel. This assumes listUpcomingHostScriptExecutions returns the first execution's rows first. If ordering changes, you could unintentionally resolve execID2 and leave execID pending, causing the test to fail.

  • Safer approach: capture host upcoming execution IDs immediately after the first BatchExecuteScript and before scheduling the second, then use those IDs when submitting results/cancel.
  • Alternatively, disambiguate the target batch when selecting which upcoming to resolve (if the item includes batch execution identifiers).

Would you like me to propose a patch that moves the listUpcoming calls before scheduling the second batch?


2294-2304: Tighten assertions: verify finished invariant or pending count

To better validate the “finished” condition and cached tallies:

  • Option A: assert that targeted == ran + errored + incompatible + canceled.
  • Option B: if NumPending is set by MarkActivitiesAsCompleted, assert it's 0.

Example:

require.Equal(t,
  *batchActivity.NumTargeted,
  *batchActivity.NumRan+*batchActivity.NumErrored+*batchActivity.NumIncompatible+*batchActivity.NumCanceled,
)

Use Option B only if NumPending is guaranteed to be non-nil for finished activities.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f53e5a and 9a62564.

📒 Files selected for processing (7)
  • cmd/fleet/cron.go (1 hunks)
  • cmd/fleet/serve.go (1 hunks)
  • server/datastore/mysql/scripts.go (1 hunks)
  • server/datastore/mysql/scripts_test.go (2 hunks)
  • server/fleet/cron_schedules.go (1 hunks)
  • server/fleet/datastore.go (1 hunks)
  • server/mock/datastore_mock.go (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

⚙️ CodeRabbit Configuration File

When reviewing SQL queries that are added or modified, ensure that appropriate filtering criteria are applied—especially when a query is intended to return data for a specific entity (e.g., a single host). Check for missing WHERE clauses or incorrect filtering that could lead to incorrect or non-deterministic results (e.g., returning the first row instead of the correct one). Flag any queries that may return unintended results due to lack of precise scoping.

Files:

  • server/fleet/cron_schedules.go
  • cmd/fleet/cron.go
  • server/datastore/mysql/scripts.go
  • server/fleet/datastore.go
  • server/datastore/mysql/scripts_test.go
  • cmd/fleet/serve.go
  • server/mock/datastore_mock.go
🧠 Learnings (1)
📚 Learning: 2025-08-08T07:40:05.274Z
Learnt from: getvictor
PR: fleetdm/fleet#31726
File: server/datastore/mysql/labels_test.go:2031-2031
Timestamp: 2025-08-08T07:40:05.274Z
Learning: In fleetdm/fleet repository tests (server/datastore/mysql/labels_test.go and similar), using testing.T.Context() is valid because the project targets a recent Go version where testing.T.Context() exists. Do not suggest replacing t.Context() with context.Background() in this codebase.

Applied to files:

  • server/datastore/mysql/scripts_test.go
🔇 Additional comments (6)
server/fleet/cron_schedules.go (1)

34-38: New schedule name constant — LGTM

Name is clear and consistent with existing schedule naming. No issues.

cmd/fleet/serve.go (1)

1048-1053: Registering the batch activity completion checker — LGTM

Registration timing and dependencies (ctx, instanceID, ds, logger) look correct.

cmd/fleet/cron.go (1)

1514-1516: LGTM: Sensible cron name and interval.

Name aligns with existing conventions and a 5-minute cadence is reasonable for eventual consistency on batch activity completion.

server/mock/datastore_mock.go (2)

1180-1181: New mock func type matches interface — LGTM

Signature aligns with MarkActivitiesAsCompleted(ctx context.Context) error. No concerns.


3191-3193: DataStore fields for MarkActivitiesAsCompleted — LGTM

Adding the func and the Invoked flag matches existing mock patterns and will help assertions in tests.

server/datastore/mysql/scripts_test.go (1)

49-49: Add test to suite: looks good

Registering TestMarkActivitiesAsCompleted in the scripts test table is appropriate and keeps coverage centralized.

Comment thread server/datastore/mysql/scripts_test.go Outdated
Comment thread server/datastore/mysql/scripts_test.go Outdated
Comment thread server/datastore/mysql/scripts.go
Comment thread server/fleet/datastore.go
Comment thread server/mock/datastore_mock.go
@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 8, 2025

Codecov Report

❌ Patch coverage is 56.92308% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.76%. Comparing base (8e417fe) to head (94f897c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
cmd/fleet/cron.go 0.00% 20 Missing ⚠️
cmd/fleet/serve.go 0.00% 5 Missing ⚠️
server/datastore/mysql/scripts.go 92.50% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #31737   +/-   ##
=======================================
  Coverage   63.76%   63.76%           
=======================================
  Files        1963     1963           
  Lines      192009   192015    +6     
  Branches     6364     6327   -37     
=======================================
+ Hits       122440   122445    +5     
- Misses      59996    59999    +3     
+ Partials     9573     9571    -2     
Flag Coverage Δ
backend 65.15% <56.92%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
dantecatalfamo
dantecatalfamo previously approved these changes Aug 8, 2025
@dantecatalfamo
Copy link
Copy Markdown
Member

@sgress454 I'm going to need to reuse that completion tally logic for the job canceling method as well. Do you think you could split it out into its own function that takes a transaction or something like that?

@dantecatalfamo
Copy link
Copy Markdown
Member

Or that takes a where clause, either way

Comment thread server/datastore/mysql/scripts.go
Base automatically changed from sgress454/31623-batch-status-endpoint to main August 8, 2025 18:24
@sgress454 sgress454 dismissed dantecatalfamo’s stale review August 8, 2025 18:24

The base branch was changed.

@sgress454 sgress454 merged commit bba0c8a into main Aug 8, 2025
40 checks passed
@sgress454 sgress454 deleted the sgress454/31555-cron-branched-off-batch-status branch August 8, 2025 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants