For workloads which need to run only during particular periods of the day
A ControlledJob is a resource which specifies:
- the definition of a
Jobto be run (a plain K8sJobSpec) - the schedule when we want that
Jobto be run. For example 'every weekday between 9am and 5pm, in the London timezone'
During the specified schedule, the controlled-job-operator will ensure that a Job object with a matching spec exists, and when the schedule says to stop, the Job is deleted.
Features:
- Control over what happens when the
JobSpecspecification on theControlledJobchanges while aJobis currently running. Either stop the oldJoband start a new one with the new spec, or ignore it until the next scheduled run - The ability to override the schedule manually. If a
Jobis manually created with the correct metadata, it will become managed by the matchingControlledJob. This allows use cases where the starting of aJobdepends on external conditions (the successfuly completion of a batch job to prepare data for the Job perhaps) or when there's a need to start aJobearlier one day for some reason, but we still want the ongoing monitoring, restarting, and stopping to be handled according to the schedule - Strong guarantees about exclusive running of the
Job. If aJobis restarted for any reason, thecontrolled-job-operatorwill start it in a suspended state, and only unsuspend it when it's sure any previousJobcan no longer be running. - Pesimistic error handling. The system will not automatically retry failing
Jobs, or restartJobsthat have exited cleanly during their scheduled time, to provide the user with the flexibility to choose how those cases are handled; settings on theJobSpecprovided by Kubernetes already allow configuration of how to handle restarts and failures of aJob(eg retry up to 3 times before giving up). The logic from theControlledJobside is simple: ensure aJobexists (in any state - starting, running, failed, succeeded) during the scheduled period, and is deleted outside of that period. The user can trigger a restart of aControlledJobsimply by deleting the currentJob, which will trigger thecontrolled-job-operatorto create a brand newJobin its place. - Comprehensive
statusconditions, that can be used to drive alerting and health checks - The ability to mutate the new
Jobspecification at creation time. For example, a dynamic image tag lookup, or substituting the current date into an env var on the createdPod. Specify a URL to a service which should behave like a standard K8s mutating webhook forJobsand it will be called before anyJobis created.
apiVersion: batch.gresearch.co.uk/v1
kind: ControlledJob
metadata:
name: controlledjob-sample
spec:
# Timezone is any standard tz database timezone name
# Optionally with an additional static offset (in seconds)
timezone:
name: "GMT"
offset: 3600 # 1h, making the overall timezone 'GMT + 1h'
# Any number of scheduled events. Each one is either 'start' or 'stop' and
# schedule can be timeOfDay & daysOfWeek, or a calid CRONTAB entry
events:
- action: "start"
schedule:
timeOfDay: "09:00"
daysOfWeek: "MON-FRI"
- action: "stop"
cronSchedule: "0 17 * * MON-FRI"
# Template for the job to create. Any valid JobSpec is accepted
jobTemplate:
metadata:
labels:
foo: bar
spec:
backoffLimit: 3
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from my ControlledJob
restartPolicy: OnFailure
This operator is built on the standard controller-runtime library using Kubebuilder and so should be familiar to anyone used to developing K8s controllers.
The main logic lives under pkg/reconciliation which is a good place to start reading.
We welcome bug fixes, issue reports, and documation improvements, however feature requests or additions are generally not in scope. Please open an issue to discuss any potential feature work and read our contributing guide for more details on how to contribute.
Please read our code of conduct before participating in or contributing guide to this project.
Please see our security policy for details on reporting security vulnerabilities.
ControlledJob is licensed under the Apache Software License 2.0 (Apache-2.0) SPDX-License-Identifier: Apache-2.0