Stories by Brian Grant on Medium

Managed fields in Kubernetes

Brian Grant — Mon, 08 Jun 2026 16:21:15 GMT

“Locked” fields. Image generated by Gemini. (Not perfect, but good enough for the illustration.)

It isn’t shown by default by kubectl get -o yaml or kubectl edit any longer, but you may have seen something like the following if you have been using Kubernetes for a while:

  managedFields:
  - apiVersion: apps/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:config.k8s.io/owning-inventory: {}
      f:spec:
        f:minReadySeconds: {}
        f:replicas: {}
        f:selector: {}
        f:strategy:
          f:rollingUpdate: {}
          f:type: {}
        f:template:
          f:metadata:
            f:labels:
              f:app: {}
          f:spec:
            f:containers:
              k:{"name":"worker"}:
                .: {}
...
    manager: confighub-bridge-worker
    operation: Apply

Managed fields were added to Kubernetes as part of the server-side apply design, so to understand it we should start with kubectl apply.

As I explained in my first blog post (search for “topic 8”), kubectl apply reconciles the desired state from files with resources in the cluster. It does this without a separate state file of the kind that Terraform maintains. On a per-resource basis, the way this worked is that kubectl recorded the “last applied configuration” in an annotation like this:

metadata:
  annotations:
    config.k8s.io/owning-inventory: 540bd711-7ae4-42c4-8f80-a36c9a6e52ba-confighub-worker
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"name":"worker","namespace":"confighub-worker"},"spec":{"minReadySeconds":10,"replicas":1,"selector":{"matchLabels":{"app":"worker"}},"strategy":{"rollingUpdate":null,"type":"Recreate"},"template":{"metadata":{"labels":{"app":"worker"}},"spec":{"containers":[{"env":[{"name":"CONFIGHUB_URL","value":"https://hub.confighub.com"},{"name":"NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}}],"envFrom":[{"secretRef":{"name":"confighub-worker-secret"}}],"image":"ghcr.io/confighubai/confighub-worker:d90ab40ddbab39b8d446a2f7c6b332cd52f644c9","imagePullPolicy":"Always","name":"worker","securityContext":{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"privileged":false,"readOnlyRootFilesystem":true},"volumeMounts":[{"mountPath":"/tmp","name":"tmp"}]}],"securityContext":{"fsGroup":2000,"runAsGroup":3000,"runAsNonRoot":true,"runAsUser":1000,"seccompProfile":{"type":"RuntimeDefault"}},"serviceAccountName":"confighub-worker","terminationGracePeriodSeconds":60,"volumes":[{"emptyDir":{},"name":"tmp"}]}}}}

This enabled kubectl to diff the new desired state from its local input with the previous desired state from the annotation on the resource in the cluster and to apply that as a patch to the cluster — this is also called a “3-way diff” or “3-way merge”.

The original purpose of performing a diff and patch rather than just overwriting the resource in the cluster was to preserve desired-state values set by automated mechanisms, especially asynchronous controllers (topic 5 in the history blog post), to avoid flapping back and forth. The canonical example is horizontal (replicas) and vertical (cpu and memory) autoscalers. But I didn’t want to constrain the mechanism to just those two examples.

The diff-and-patch mechanism was implemented in kubectl, which, of course, was written in Go. This made it difficult to embed in other languages, like Python, Java, and Javascript, and exec’ing kubectl was not an option for web UIs. Furthermore, the implementation had enough dependencies that even other Go programs, such as Terraform, didn’t integrate the apply implementation. And CRD support required the OpenAPI spec, which was typically large, from the apiserver. There was also no dry-run support.

To address these issues, we decided to develop a server-side implementation. A comprehensive use-case analysis was done as part of the design.

The original proposal included recording the field values, similar to the last-applied-configuration annotation, but that was problematic for some cases (e.g., secrets, large values), and we realized it was unnecessary due to the next development, which made the apply operation not a diff and patch, but a predicated patch.

To prevent overwriting or removing values set by automated systems, the “field manager” concept was developed, as a generalization of “apply” vs. “other operation”, where “other operation” includes:

kubectl commands
asynchronous controllers, including those updating status fields
mutating admission control
the Kubernetes default value pass

The field manager effectively “owns” the field and can freely make changes. Other clients (with other field-manager values) attempting to change the field would fail due to a conflict.

That introduced a new failure mode for Kubernetes clients, but only those using the new feature. Update requests could fail entirely or partially. For true conflicts, such as a GitOps controller fighting with another controller, this was considered a feature. For one-off operational changes using kubectl, it was a potential source of problems, as kubectl has its own field-manager value.

As a consequence, some GitOps controllers like ArgoCD, grab ownership of fields they apply, defeating the conflict mechanism. Flux selectively takes ownership. Different apply implementations implement different policies, so it can be the source of surprising behavior. Helm v4 apparently defaults to not taking ownership (known as force conflicts).

While the field manager can be useful for debugging, similar to the User-Agent HTTP header field, its value is an arbitrary string, which makes it harder to use to decide what to do in the case of a pre-existing field manager. That’s why Flux hardcodes a few well known values, such as “kubectl”. A higher-level category, such as the categories I enumerated above, would be more useful.

I didn’t find a tool that did this, so I created one (github), k8s-mf. It can read managed fields from a file (use kubectl get -o yaml --show-managed-fields) or from the cluster. It can also change managed fields.

% k8s-mf --help
k8s-mf inspects and repairs Kubernetes server-side-apply (SSA) managed fields.

Managed fields record which field manager owns each field of a resource. Leftover
or competing managers are the usual cause of apply surprises — fields silently
retained, deletions blocked, or apply conflicts — especially after a kubectl
"break glass" edit or a transition between tools (kubectl apply, ArgoCD, Flux, Sveltos).

The kubeconfig is loaded with kubectl precedence: --kubeconfig flag, then the
KUBECONFIG environment variable, then $HOME/.kube/config.

Usage:
  k8s-mf [command]

Available Commands:
  categories    Show which fields each category of field manager owns
  cleanup       Show the result of refresh/import cleanup (ExtraCleanupObjects)
  completion    Generate the autocompletion script for the specified shell
  conflicts     Predict which fields an apply would conflict over, without changing anything
  dry-run-apply Server-side dry-run an apply as a given field manager
  help          Help about any command
  takeover      Remove other appliers' field managers so one applier owns the resource
  values        Show the values of fields owned by appliers

Flags:
      --context string      Kubernetes context to use
  -h, --help                help for k8s-mf
      --kubeconfig string   Path to the kubeconfig file (overrides KUBECONFIG and the default)
  -n, --namespace string    Namespace of the resource (ignored for cluster-scoped resources) (default "default")

Using the earlier example, this is what the categories look like (some fields elided for length):

% k8s-mf categories -f /tmp/worker-mf.yaml
Resource: Deployment/confighub-worker/worker

APPLIER — 2 manager(s): confighub-bridge-worker (ConfigHub), kubectl-rollout (kubectl)
  44 field(s):
    .metadata.annotations.config.k8s.io/owning-inventory
    .spec.minReadySeconds
    .spec.replicas
    .spec.selector
    .spec.strategy.rollingUpdate
    .spec.strategy.type
    .spec.template.metadata.annotations.kubectl.kubernetes.io/restartedAt
    .spec.template.metadata.labels.app
    .spec.template.spec.containers[name="worker"].envFrom
    .spec.template.spec.containers[name="worker"].env[name="CONFIGHUB_URL"].name
...

ASYNCCONTROLLER — 1 manager(s): kube-controller-manager (Kubernetes)
  18 field(s):
    .metadata.annotations.deployment.kubernetes.io/revision
    .status.availableReplicas
    .status.conditions[type="Available"].lastTransitionTime
...

DEFAULT FIELDS (present on the object but owned by no manager — API-server defaults):
  .metadata.annotations.kubectl.kubernetes.io/last-applied-configuration
  .spec.progressDeadlineSeconds
  .spec.revisionHistoryLimit
...

The categories are determined by a fairly long (dozens of entries) hardcoded list:

var exactManagers = map[string]managerInfo{
 // --- Appliers (whole-resource owners) ---
 "argocd-controller":             {CategoryApplier, "ArgoCD"}, // ArgoCD default SSA manager (ArgoCDSSAManager)
 "argocd-application-controller": {CategoryApplier, "ArgoCD"}, // ArgoCD application controller (seen on some resources)
 "helm":                          {CategoryApplier, "Helm"},
 "helm-controller":               {CategoryApplier, "Flux"}, // Flux HelmRelease
 "kustomize-controller":          {CategoryApplier, "Flux"}, // Flux Kustomization
 "flux":                          {CategoryApplier, "Flux"},
...
 "endpoint-controller":         {CategoryAsyncController, "Kubernetes"},
 "endpointslice-controller":    {CategoryAsyncController, "Kubernetes"},
 "service-controller":          {CategoryAsyncController, "Kubernetes"},
 "deployment-controller":       {CategoryAsyncController, "Kubernetes"},
...
 "kube-controller-manager": {CategoryAsyncController, "Kubernetes"},
}

You can also opt to see the values of a selected category, defaulting to Applier, which includes any kubectl operation — “Applier” is an approximation of general user intent:

% k8s-mf values -f /tmp/worker-mf.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    config.k8s.io/owning-inventory: 540bd711-7ae4-42c4-8f80-a36c9a6e52ba-confighub-worker
  name: worker
  namespace: confighub-worker
spec:
  minReadySeconds: 10
  replicas: 1
...

There is also a conflicts command to report field manager conflicts, a dry-run-apply command to invoke server-side apply with a specified field manager so you can see what it would do, and a takeover command to remove a specified field manager. The commands that communicate with a cluster have flags for that similar to kubectl ( — context, — kubeconfig, — namespace).

Give it a try and let me know what you think.

ConfigHub has a mechanism for tracking the provenance of every field that was inspired by managed fields, but records a lot more information and more semantically meaningful information, for use to decide how to merge configuration changes. An example of where this is used is to merge changes from the upgrade of an installer package without clobbering post-install changes made in ConfigHub. I’m tweaking how this mechanism works and this post is long already, so I’ll write another post about that.

Hopefully that gives you a sense of why managed fields were added, what their purpose is, and how they facilitate the merging of configuration from multiple sources. The ability to merge configuration intelligently is a configuration management power tool.

Have you seen managed fields in YAML returned by the apiserver and wondered what they were for? Have you been surprised by a Kubernetes server-side apply operation not changing all of the fields you intended to change? If so, how did you debug it? Have you had trouble figuring out how to transfer ownership to a horizontal or vertical autoscaler? Did you wish that apply just overwrote all of the fields instead?

Feel free to reply here, or send me a message on LinkedIn, X/Twitter, or Bluesky, where I plan to crosspost this.

You are also welcome to try out ConfigHub, which is now in preview.

If you found this interesting, you may be interested in other posts in my Kubernetes series.

Managed fields in Kubernetes was originally published in ITNEXT on Medium, where people are continuing the conversation by highlighting and responding to this story.

Kustomize, Kpt, Porch, and ConfigHub

Brian Grant — Mon, 01 Jun 2026 15:56:20 GMT

Logos for these tools, except for Porch. I just created an image for it.

In my previous post, I wrote about my new kustomize-based installer, which also integrates with kpt. Kustomize, Kpt, and Porch are three configuration as data tools for Kubernetes that I previously created before ConfigHub. I already wrote a post on the origins of Kustomize, so I’m not going to rehash that, and hopefully you are already at least somewhat familiar with Kustomize, but I will discuss the challenges and gaps with Kustomize that led to the creation of the other tools and, ultimately, to ConfigHub.

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
metadata:
  name: hello
resources:
- deployment.yaml
- service.yaml
- ingress.yaml
- configmap.yaml
commonLabels:
  app: hello
images:
- name: hello
  newTag: v42

I wrote the rationale for kpt, which explains more about the concept of configuration as data than the rationale for kpt itself. However, it does touch on the motivation that kpt was intended to fill gaps with kustomize, much as kustomize intended to fill gaps in kubectl.

One such gap was that we wanted a simple mechanism to package sample configurations for Kubernetes (e.g., guestbook), GKE (e.g., microservices demo), and Anthos (e.g., Bank of Anthos) without taking on Helm as a dependency and without adding packaging to kustomize, due to scope concerns. Kustomize supports remote bases and rendering remote configuration, but it doesn’t facilitate pulling a copy of the original configuration.

The core mechanic of kpt’s package functionality is based on the observation that many git-based package tools host multiple packages in a single git repository, but git itself doesn’t provide mechanisms for interacting with subdirectories in a repository. kpt is able to do that, making any existing folder of Kubernetes YAML implicitly gettable as a kpt package.

% kpt pkg get https://github.com/kubernetes/examples.git/web/guestbook/all-in-one@master

In addition to packaging, the kpt live command group added more robust pruning than what kubectl provides, and the ability to wait for the applied resources to become ready, using kstatus. Since then, kstatus was adopted by Helm 4, and Flux integrated a fork of kstatus. By this time, GitOps tools have been widely adopted, but back in 2019 or so that was not the case, and the lack of a robust applier with pruning and status checking was considered a hindrance to adoption of kustomize and all other alternatives to Helm templates. Moreover, we hoped to eventually integrate the functionality into kubectl to make kubectl itself more usable.

% kpt live apply hello

A third motivation was that the policy controller team wanted to be able to run OPA Gatekeeper over rendered configuration to validate it and enforce policies in CI. This was the initial driving motivation for the creation of KRM functions. That effort merged with the kpt effort, bringing functions to kpt.

apiVersion: kpt.dev/v1
kind: Kptfile
metadata:
  name: hello
pipeline:
  validators:
  - image: ghcr.io/kptdev/krm-functions-catalog/kubeconform:latest
  - image: ghcr.io/kptdev/krm-functions-catalog/gatekeeper:latest

Originally I imagined kpt would complement kustomize, but there were several challenges to that. For example, support for remote bases in kustomize and common kustomization patterns made kustomization trees frequently not self-contained. Also, the addition of functions to kpt created overlap with kustomize transformer plugins. Additionally, since kustomize was integrated into kubectl, it was problematic to integrate the apply functionality into kustomize.

There was an effort to try to converge kpt and kustomize. A result was that validators, KRM functions, localize, and cfg configuration viewing commands were added to kustomize.

What really made kpt distinct, though, was the departure on how it handled variants, which also relates to one of the primary areas of criticism of kustomize: the challenges with patches.

Example:

resources:
- ../../base
patches:
- path: patch.yaml
- path: healthcheck_patch.yaml
- path: memorylimit_patch.yaml

Kustomize generates variants of a base configuration by performing out-of-place transformations (i.e., rendering its output without modifying the base resource files), mostly notably patching. The patches, which can be strategic merge patches (aka overlays) or JSON patches, are generally written and maintained by hand, which itself is one challenge. The patches are targeted by resource types and names, and specific field paths (using merge keys in the case of an overlay) or selectors. This can make patches brittle and refactoring difficult, similar to git merge conflicts. The final configuration can also be difficult to reason about when there are multiple layers of patches.

# ingress_patch.yaml
- op: add
  path: /spec/rules/0/http/paths/-
  value:
    path: '/example'
    backend:
      serviceName: hello
      servicePort: 5555
...
# kustomization.yaml
- path: ingress_patch.yaml
  target:
    group: networking.k8s.io
    version: v1
    kind: Ingress
    name: hello

The replacements transformer is similar to a patch, but decouples the value source, which makes it more of a repeatable transformation engine than a one-off patch. It was introduced in order to obviate the need for the parameter-like vars feature. It shares most of the challenges with patches, however. It’s also too limited for some kinds of substitutions, which could be addressed by other transformers/functions, where necessary.

replacements:
- source:
    kind: Service
    labelSelector: "environment=production,tier=frontend"
    fieldPath: metadata.name
  targets:
  - select:
      kind: Ingress
      name: hello
    fieldPaths:
    - spec.rules.0.http.paths.0.backend.service.name

Both patches and templates are simple when there are only a few values that need to be customized in different variants, but they both become complicated when many values need to be changed. Patches can be simpler for the case where different variants need to change different values. Templates can be simpler when different variants need to change the same values. Kustomize’s built-in transformers were intended to address the most common such cases, such as changing images, but there aren’t specific transformers available for most Kubernetes resource fields. More powerful generators, such as DSLs and general-purpose programming languages, can express powerful logic, when necessary, but they don’t significantly improve the experience of exposing lots of input parameters read from a file, come with a learning curve (especially with DSLs), and don’t solve fundamental issues with configuration as code, such as the need for flexibility driving complexity, lack of interoperability, configuration drift, blast radius, and sprawl.

Kpt is a package tool that represents configuration as always fully rendered / WET YAML and facilitates forking of base packages by supporting merging of updates, automatically deriving patches. This solved a couple of the challenges: there’s no need to write and maintain patches, and the configuration is always clear. Kpt supports multiple merge strategies, but conflicts are, of course, still possible, especially when whomever is maintaining a package is not also aware of the impact of changes on downstream variants of it.

Kpt emphasizes use of functions for transformations more than Kustomize emphasizes transformer plugins, for which there’s barely any documentation. Moreover, it enables functions to be invoked imperatively, while recording the result declaratively, since the configuration data is persisted in git. Kustomize’s imperative editing experience is limited to changing the kustomization.yaml file with the edit command. The kubectl patch --localcommand modifies resources in files locally, which can be handy. I also use kubectl create --dry-run=client -o yaml to create resources. I added a new command to the installer to do this more cleanly and extensibly.

A challenge with Kpt was that managing many WET variants required more orchestration to keep all of the variants up to date with upstream changes via kpt pkg updateand to invoke functions on multiple variants via kpt fn render and kpt fn eval. Additionally, the number of git and kpt pkgoperations required even to update a single variant made the experience fairly tedious.

# Create a package
% cd $upstream_dir
% kpt pkg init .
% git add . && git commit -m "init base package" && git push

# Create a downstream variant
% cd $downstream_dir
% kpt pkg get "$upstream_dir"
% git add . && git commit -m "clone base into variant" && git push

# Change upstream
% cd $upstream_dir
% $EDITOR deployment.yaml
% git add . && git commit -m "edited the base" && git push

# Update downstream
% cd $downstream_dir
% kpt pkg update .
% git add . && git commit -m "update downstream" && git push

This gave birth to Porch, the package orchestrator, which was spawned from a one-month prototyping effort (demo of the prototype). Porch provided an API layer over Kpt and git. That made it possible to read and write Kubernetes configuration and to invoke functions via Kubernetes APIs. The Go function catalog was built into Porch, rather than invoked as individual containers, in order to achieve interactive performance.

The Backstage GUI (repo) used the KRM function interface to change the configuration and save it back to git automatically. This enabled ClickOps backed by GitOps.

I thought Porch was promising, but my responsibilities shifted and I had to focus elsewhere. Porch initially became part of the Nephio project, but it looks like it has been reunited with the Kpt project recently.

That brings me to ConfigHub. I started where I left off with Porch, but a clean slate allowed several different choices, such as:

Kpt and Porch store the configuration as “code”, in git, and that’s core to how they work. ConfigHub stores configuration as data, in a database. This facilitates centralized, shared access to the configuration via an API, makes it queryable, mitigates sprawl, and has several other benefits. We’ll write another post on why we don’t store configuration in git.
ConfigHub supports richer relationships between variants and dependencies. The relationships between variants make it easier to reason about upstream and downstream relationships and to promote changes. The relationships between dependencies simplify propagation of values between resources. The links that represent dependencies can also include associated computation for transforming data from upstream configurations for insertion into downstream configurations.
ConfigHub stores more metadata about changes, which is used to control merge behavior at a fine-grained level, in addition to providing more value provenance information in more detail than what git records on its own.
ConfigHub decouples automated function invocations (Triggers) from configuration data, with different trigger events to distinguish variant creation and admission control. Kpt’s function pipelines were overloaded for multiple purposes without distinguishing them.
ConfigHub supports non-KRM configuration formats. For example, it enables reading and writing application configuration (TOML, INI, JSON, etc.) in its native representation. This was not solved in Kpt, which has an “everything is a Kubernetes resource” model.
ConfigHub changed the function execution model, so that functions are hosted by an executor, including a built-in executor, rather than individual containers or executables. Even without additional optimization, this makes function “packaging” less granular and function execution faster.
Neither ConfigHub nor the installer support problematic features of kpt, such as setters and nested packages.

The ConfigHub function executor and function catalog are open source, if you’re curious how they work and/or what functions are available. The functions can be executed locally via the installer as a Kustomize transformer plugin and also imperatively usingcub function local.

Hopefully that gives you some insight into how these tools are tools are different from one another and why each was created. Which one is better depends on your use case, constraints, level of familiarity with the tools, and other factors. I’ll dive deeper into ConfigHub in future posts.

If you are dissatisfied with the Kubernetes configuration customization tool(s) you are currently using, I suggest giving ConfigHub a try. It’s currently in preview. We’re looking for feedback, actively working to improve ConfigHub, and still have the flexibility to make changes.

Feel free to reply here, or send me a message on LinkedIn, X/Twitter, or Bluesky, where I plan to crosspost this.

If you found this interesting, you may be interested in other posts in my Kubernetes series.

Kustomize, Kpt, Porch, and ConfigHub was originally published in ITNEXT on Medium, where people are continuing the conversation by highlighting and responding to this story.

A new Kubernetes package installer

Brian Grant — Tue, 26 May 2026 16:39:06 GMT

Image generated by Gemini

Helm has long been most-used package manager for Kubernetes. It filled a gap for Kubernetes at a critical time back in 2016 when Google got involved in the project. We needed to make it easy to install off-the-shelf applications on Kubernetes to enable an ecosystem similar to what was developing around Docker Hub and what Mesosphere was developing around DCOS.

Operating systems had packages, and so did configuration management tools: Puppet Forge, Ansible Galaxy, and Chef Supermarket. So packages, and the community chart repo, for Kubernetes made sense.

Speaking of Ansible, it used Jinja YAML templates. So did Google Deployment Manager. Lots of CI systems at the time also used YAML. YAML and templates were believed to be more familiar and accessible to operators than, say, Python code, which Google Deployment Manager also supported. Kubernetes (and Docker and Etcd) was written in Go, so Go templates made sense from that perspective, as there wasn’t a Jinja implementation in Go.

But since then, many alternative tools have been developed, focused mainly on the configuration generation functionality of Helm, as an alternative to YAML templates. It’s not hard to find posts mentioning the complexity of Helm templates. I’ve written about it. In particular, templates for off-the-shelf packages can be extremely large and complex due to the desire to enable pretty much every field of every resource in each package to be customized.

There have been fewer tools targeting Helm’s package management functionality, though there have been some. Glasskube is a recent one, which does address some key package problem areas, notably dependency management. But it wraps Helm and adds its own patching mechanism, so it doesn’t really address the configuration complexity problem.

I recently wrote about how I converted a Helm chart to an AI agent skill. I liked a number of aspects of that experience, but it was challenging to understand what the agent would do.

So I created my own package installer based on configuration as data, with a goal of making the installation process simpler, more understandable, more automated, and more aligned with ConfigHub. It’s open source and can be used without ConfigHub.

With help from AI, I was able to produce the bulk of the initial implementation in one week. It helped to be treading a relatively well paved path.

I already wrote a post about what a package is in configuration as code tools, so I won’t rehash that here.

But what is a package installer? It’s a tool that installs application packages on a target platform. The reason I didn’t call it a package manager is because, while I do want to provide dependency management similar to what package managers for other systems do, I want to manage the lifecycle of the installed configuration differently (discussed more below).

I took inspiration from install wizards, like InstallShield, and from the experience of installing apps on my laptop. The experience is very simple, with at most a few high-level decisions about which components or features to install, and where to install the application, and with default values for all choices. Changes to detailed settings are made after installation rather than before or during the installation process. Post-installation changes can also be made to Debian package conffiles, specifically for the application configuration part of the package.

Enabling post-installation changes to be made to Kubernetes configuration such that they won’t be clobbered upon upgrade requires managing the rendered configuration differently. Obviously, if the whole configuration is regenerated from scratch, naively, any changes to the previously generated configuration would be dropped. Managing configuration as data enables post-installation customization by storing the rendered, customized configuration, and by merging changes from rendered manifests.

The same approach enables decoupling variant management and operational changes from installation. Packages don’t need to parameterize replicas, cpu and memory resources, environment variables, labels, annotations, and other standard Kubernetes fields because those can just be changed with later, with functions, AI agents, hand editing, or merging from the live state.

In the installer, choices of which “components” to install, where to install the application (i.e., the Kubernetes Namespace), and any other information that needs to be provided at install time are gathered by a “wizard”, which can be used interactively or non-interactively (including driven by an AI agent), and are recorded in KRM files an out/spec directory.

apiVersion: installer.confighub.com/v1alpha1
kind: Selection
metadata:
  name: argocd-selection
spec:
  package: argocd
  packageVersion: 3.4.2
  base: cluster-install
---
apiVersion: installer.confighub.com/v1alpha1
kind: Inputs
metadata:
  name: argocd-inputs
spec:
  package: argocd
  packageVersion: 3.4.2
  namespace: argocd
  values: {}

I also took inspiration from Autoconf, curl | bash installers, Puppet Facter, Ansible Facts, and Nvidia AICR. These tools automatically gather information, such as the architecture of the target system, and use it to configure the installation process. That’s a kind of automation that has been lost in the pursuit of hermetic and idempotent configuration generation, which itself is a problem that was created by repeatedly regenerating configuration using configuration as code formats.

The installer currently supports a local “fact” collection step, which can collect details about the workload being deployed and/or facts about the target cluster the application will be deployed to. It’s local (for now, at least) so that it can use the context and credentials of the user. The facts are similarly recorded.

apiVersion: installer.confighub.com/v1alpha1
kind: Facts
metadata:
  name: confighub-worker-facts
spec:
  package: confighub-worker
  packageVersion: 0.1.0
  values:
    bridgeWorkerID: 6590e916-caee-4a3e-8096-2faeda20d04e
    configHubURL: https://hub.confighub.com
    image: ghcr.io/confighubai/confighub-worker:v0.1.46

Once choices, inputs, and facts are recorded, then the configuration is rendered using the recorded information, which makes it repeatable and idempotent unless inputs are changed deliberately.

I decided to use Kustomize to perform the rendering. The installer generates a kustomization.yaml file and uses the Components feature of Kustomize to combine the selected bases and components, which was a good fit for that part of the process. That also makes it possible to use existing kustomizations in the installer.

Also, while Kustomize is best known for patching configuration, it also has the ability to transform configuration programmatically. The installer uses the kustomize image transformer to facilitate overriding images specifically, because that’s frequently needed for image mirroring, rollbacks, rolling out CVE fixes, and so on.

However, kustomize has a modest number of other built-in transformers, and I’m not aware of any popular transformer plugins or KRM functions. I wanted to enable more complex and robust configuration changes. So the installer transformer command serves as a Kustomize transformer plugin (as a native executable rather than as a container) that can invoke any built-in ConfigHub function, including functions written in Starlark, CEL, and yq. It is invoked automatically by the rendering process for transformers and validators specified in the Package spec.

  transformers:
    - toolchain: Kubernetes/YAML
      whereResource: ""
      description: Set the namespace on every namespaced resource and on RBAC subjects.
      invocations:
        - name: set-namespace
          args: ["{{ .Namespace }}"]

I also wanted to be able to validate the rendered configuration. At the moment, that’s done in the installer after kustomize rendering completes.

  validators:
    - toolchain: Kubernetes/YAML
      description: Default validators applied at the end of every render.
      invocations:
        - name: vet-schemas # runs kubeconform
        - name: vet-merge-keys # checks for duplicate merge keys, e.g. envs
        - name: vet-format # yaml linter

All of these steps can be invoked via a single installer setup command. Installation with default bases, components, and inputs, if any, is concise:

% installer setup --work-dir argo-install --namespace argocd \
    --pull ${installer_dir}/packages/argocd --non-interactive

A record of what the installer did, functions invoked with expanded arguments and resource files generated, is also written. For now, this is mostly for debugging and auditing.

apiVersion: installer.confighub.com/v1alpha1
kind: FunctionChain
metadata:
  name: argocd-function-chain
spec:
  package: argocd
  packageVersion: 3.4.2
  groups:
    - toolchain: Kubernetes/YAML
      invocations:
        - name: set-namespace
          args:
            - argocd
      description: Set the namespace on every namespaced resource and on RBAC subjects.
---
apiVersion: installer.confighub.com/v1alpha1
kind: ManifestIndex
metadata:
  name: argocd-manifest-index
spec:
  package: argocd
  packageVersion: 3.4.2
  files:
    - filename: clusterrole-argocd-application-controller.yaml
      slug: clusterrole-argocd-application-controller
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      name: argocd-application-controller
...
    - filename: statefulset-argocd-argocd-application-controller.yaml
      slug: statefulset-argocd-argocd-application-controller
      apiVersion: apps/v1
      kind: StatefulSet
      name: argocd-application-controller
      namespace: argocd

Similar to Kustomize, installer has an edit command for modifying the Package spec, so that the YAML doesn’t have to be edited by hand in an editor, though it can be.

Secrets, if any, are output to a different directory out/secrets from the other manifests, which are output to out/manifests. There’s more work to do on secret support in the future. The rendered manifests could then be uploaded to ConfigHub using installer upload , or applied directly with kubectl or your favorite GitOps tool.

For those of you who use kpt or want to try kpt, I added a kpt guide. The role of kpt in the guide is to manage post-installation changes. Like ConfigHub, kpt can merge fully rendered, WET Kubernetes YAML, so it can be used to preserve changes after re-rendering with kustomize for upgrades and changed installation choices. That requires committing (and pushing and tagging) the rendered manifests and copying them and propagating updates with kpt pkg commands.

Why not just use kustomize or kpt? Neither tool was designed to be a full-blown installer. For example, they don’t have interactive wizards, component selection, fact collection, and so on.

Additionally, package management was explicitly out of scope for kustomize. kpt is git-centric (ConfigHub does not store configuration in git), doesn’t support OCI-based packages (issue), and doesn’t do dependency management. I also wanted to compose packages rather than statically nest them. I’ll write another post about installer’s package functionality, and another about kpt and its relation to kustomize.

So, the installer is DRY-ish (DAMP? — need a good backronym for that), because installers of off-the-shelf packages involve choices, computation, and, sometimes, non-hermetic processes. Luckily, with configuration as data (and ConfigHub), configuration can be authored using whatever mechanism is most effective, such as an install wizard or AI agent or GUI or importing it from a cluster, and can be modified directly in its fully rendered form after that, over time.

Decoupling the stages of customization (installation, post-installation, variant creation, operational changes, etc.), keeping the configuration fully rendered (WET) at every step, enabling tools to operate on the configuration, and maintaining relationships between variants can simplify configuration management compared to a monolithic template or generator that needs to incorporate all needs in one generation pass. Future posts will explore this approach in more depth.

I didn’t have space in this post to cover all of the installer’s features (e.g., application configuration), but hopefully this gave you a sense of why I created it and how it’s different.

Have you tried to use kustomize or kpt for package installation as an alternative to Helm? If you’ve tried Glasskube, what did you think? Have you tried another package manager? What would you like to see in a new Kubernetes package installer / manager? Give the installer a try and let me know what you think.

You are also welcome to try out ConfigHub, which is now in preview.

Feel free to reply here, or send me a message on LinkedIn, X/Twitter, or Bluesky, where I plan to crosspost this.

If you found this interesting, you may be interested in other posts in my Kubernetes series.

A new Kubernetes package installer was originally published in ITNEXT on Medium, where people are continuing the conversation by highlighting and responding to this story.

Converting Helm charts to Agent Skills

Brian Grant — Mon, 13 Apr 2026 16:22:08 GMT

I wanted to run vLLM on Kubernetes. There were several options:

While there was a little documentation for the Operator, the documentation focused on the Helm chart, and most features (e.g., prefix-aware routing) only had examples using the Helm chart. The Claude/agent skill and plain YAML were quite simple, and didn’t explicitly support most features, which were presumably covered elsewhere in the documentation. I liked the idea of using an agent skill for installation, though.

As I’ve mentioned before, Helm charts that act as installers for off-the-shelf applications can be quite complicated because they frequently enable significant customization of the configuration. Rather than trying to figure out how to use the Helm chart myself, I used Claude Code.

First I asked Claude what the Operator could do that the Helm chart couldn’t do, and the answer was nothing really. On the other hand, the Helm chart supported many more features: Ray, autoscaling, Prometheus, and more. So, I did need to start with the Helm chart.

The Helm chart had a values.schema.json file, but it did not seem complete. I asked Claude for an analysis:

The helm chart's values.schema.json file is significantly incomplete. It covers only a fraction of the supported values.

What the schema covers

Only 3 top-level properties: servingEngineSpec, routerSpec, and cacheserverSpec — and even within those, coverage is partial. The schema handles the basics of modelSpec[] items reasonably well (name,
repository, tag, resources, GPU, PVC, probes, KEDA).

What's missing from the schema

6 entire top-level sections:

loraAdapters, loraController, extraObjects, grafanaDashboards, kube-prometheus-stack, prometheus-adapter
~10 servingEngineSpec-level properties, including:

enableEngine, configs, strategy, readinessProbe, sidecar, serviceMonitor, imagePullPolicy, env
~15+ modelSpec[] item properties, including:

chatTemplate, enableLoRA, enableTool, toolCallParser, shmSize, extraVolumes, extraVolumeMounts, affinity, raySpec, envFromSecret, storageClass, pvcAccessMode
~20+ lmcacheConfig sub-properties (all NIXL, PD, controller, and p2p fields)

~7+ vllmConfig sub-properties (tensorParallelSize, maxNumSeqs, maxLoras, gpuMemoryUtilization, v0, v1, prometheusMultiprocDir)

~15+ routerSpec properties (enableRouter, repository, tag, resources, serviceType, ingress, route, affinity, readinessProbe, serviceMonitor, etc.)

There are also a few values referenced in templates that aren't documented in values.yaml either (e.g., sharedPvcStorage, routerSpec.hf_token, pvcStorage as an object with emptyDir).

As is fairly common, the values.yaml file used comments for documentation. For example:

# -- LoRA Adapter Configuration
loraAdapters: []

  # -- LoRA adapter instances to deploy
  # Each instance should contain:
  # - name: (string) The name of the LoRA adapter instance
  # - baseModel: (string) The name of the base model this adapter is for
  # - vllmApiKey: (optional) API key configuration for vLLM authentication
  #   - secretRef: Reference to a secret containing the API key
  #     - secretName: (string) Name of the secret
  #     - secretKey: (string) Key in the secret containing the API key
  #   - value: (string) Direct API key value
  # - adapterSource: (object) Configuration for the adapter source
  #   - type: (string) Type of adapter source (local, s3, http, huggingface)
  #   - adapterName: (string) Name of the adapter to apply
  #   - adapterPath: (optional, string) Path to the LoRA adapter weights
  #   - repository: (optional, string) Repository to get the LoRA adapter from
  #   - pattern: (optional, string) Pattern to use for the adapter name
  #   - maxAdapters: (optional, int) Maximum number of adapters to load
  #   - credentials: (optional, object) Reference to secret with storage credentials
  #     - secretName: (string) Name of the secret
  #     - secretKey: (string) Key in the secret containing the credentials
  # - loraAdapterDeploymentConfig: (object) Configuration for adapter deployment
  #   - algorithm: (string) Placement algorithm to use (default, ordered, equalized)
  #   - replicas: (optional, int) Number of replicas that should load this adapter
  # - labels: (optional, map) Additional labels for the LoRA adapter
  #
  # Example:
  # loraAdapters:
  #   - name: "llama3-nemoguard-adapter"
  #     baseModel: "llama3-8b-instr"
  #     vllmApiKey:
  #       secretRef:
  #         secretName: "vllm-api-key"
  #         secretKey: "VLLM_API_KEY"
  #     adapterSource:
  #       type: "local"
  #       adapterName: "llama-3.1-nemoguard-8b-topic-control"
  #       adapterPath: "/data/lora-adapters/llama-3.1-nemoguard-8b-topic-control"
  #     loraAdapterDeploymentConfig:
  #       algorithm: "default"
  #       replicas: 1
  #     labels:
  #       environment: "production"
  #       model: "llama3-nemoguard"

The loraAdapters configuration above is used almost verbatim in a CRD:

{{- if .Values.loraAdapters }}
{{- range .Values.loraAdapters }}
---
apiVersion: production-stack.vllm.ai/v1alpha1
kind: LoraAdapter
metadata:
  name: {{ .name }}
  namespace: {{ $.Release.Namespace }}
  labels:
    app.kubernetes.io/component: lora-adapter
    helm-release-name: {{ $.Release.Name }}
spec:
  baseModel: {{ .baseModel | quote }}
  {{- if and (not .vllmApiKey) ($.Values.servingEngineSpec.vllmApiKey) }}
  vllmApiKey:
    secretName: {{ $.Release.Name }}-secrets
    secretKey: vllmApiKey
  {{- else if .vllmApiKey }}
  vllmApiKey:
    secretName: {{ .vllmApiKey.secretName | quote }}
    secretKey: {{ .vllmApiKey.secretKey | quote }}
  {{- end }}
  adapterSource:
    type: {{ .adapterSource.type | quote }}
    adapterName: {{ .adapterSource.adapterName | quote }}
    {{- if .adapterSource.adapterPath }}
    adapterPath: {{ .adapterSource.adapterPath | quote }}
    {{- end }}
    {{- if .adapterSource.repository }}
    repository: {{ .adapterSource.repository | quote }}
    {{- end }}
    {{- if .adapterSource.pattern }}
    pattern: {{ .adapterSource.pattern | quote }}
    {{- end }}
    {{- if .adapterSource.maxAdapters }}
    maxAdapters: {{ .adapterSource.maxAdapters }}
    {{- end }}
    {{- if .adapterSource.credentials }}
    {{- if (kindIs "string" .adapterSource.credentials) }}
    credentialsSecretRef:
      name: {{ $.Release.Name }}-secrets
      key: lora_adapter_credentials_{{ .adapterSource.adapterName }}
    {{- else }}
    credentialsSecretRef:
      name: {{ .adapterSource.credentials.secretName | quote }}
      key: {{ .adapterSource.credentials.secretKey | quote }}
    {{- end }}
    {{- end }}
  loraAdapterDeploymentConfig:
    algorithm: {{ .loraAdapterDeploymentConfig.algorithm | default "default" | quote }}
    {{- if .loraAdapterDeploymentConfig.replicas }}
    replicas: {{ .loraAdapterDeploymentConfig.replicas }}
    {{- end }}
{{- end }}
{{- end }}

Given the largely 1:1 value-to-field correspondence, and the use of examples both in the documentation and in the values.yaml file, I wondered what the point was of using Helm templates at all. I wanted to cut out that whole layer of unnecessary complexity. (This issue isn’t Helm-specific. It also applies to domain-specific languages for configuration generation and general-purpose-language-based generators.)

So, I asked Claude to create a Claude/agent skill to install vLLM:

I want to create an agent skill that is as comprehensive as the helm chart, but that does not use helm, kustomize, or the Operator. I just want to use plain Kubernetes YAML files and select and modify them as necessary. The skill should be able to make use of the vLLM features supported by the helm chart.

I didn’t want to directly install vLLM, however. I wanted to manage vLLM through ConfigHub:

Instead of deploying the resources directly to a Kubernetes cluster, I want to create units in ConfigHub using its CLI, cub.

That didn’t quite work in one shot. We need to develop a skill to teach AI agents how to use ConfigHub. But using our documentation and --help mostly worked fine. We previously experimented with MCP, but didn’t find it to work better than our CLI.

  SKILL.md (429 lines) — The skill definition with:                                                                                                                                                         
  - Trigger: activates when the user wants to deploy vLLM on Kubernetes via ConfigHub                                                                                                                       
  - 6-step deployment procedure: gather requirements, create space, prepare/create units, set namespaces, verify, report summary                                                                            
  - Comprehensive configuration reference for all vLLM features                                                                                                                                             
  - Instructions to use cub space create and cub unit create for each resource                                                                                                                              
                                                                              
  19 template YAML files covering all components from the helm chart:                                                                                                                                       
                                                                                                                                                                                                            
  ┌────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐        
  │      Category      │                                                                                 Templates                                                                                 │        
  ├────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Engine (per model) │ engine-deployment.yaml, engine-service.yaml, engine-pvc.yaml, engine-pdb.yaml, engine-configmap.yaml                                                                      │
  ├────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Router             │ router-deployment.yaml, router-service.yaml, router-serviceaccount.yaml, router-role.yaml, router-rolebinding.yaml, router-hpa.yaml, router-pdb.yaml, router-ingress.yaml │        
  ├────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤        
  │ Cache Server       │ cache-server-deployment.yaml, cache-server-service.yaml                                                                                                                   │        
  ├────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤        
  │ Autoscaling        │ engine-keda-scaledobject.yaml                                                                                                                                             │
  ├────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Monitoring         │ engine-servicemonitor.yaml, router-servicemonitor.yaml                                                                                                                    │
  ├────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤        
  │ Secrets            │ secrets.yaml                                                                                                                                                              │
  └────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘        
                                                            
  Key design decisions:
  - Each template is a single Kubernetes resource (1:1 mapping to ConfigHub units)
  - Uses confighubplaceholder for values requiring user input (model URL, etc.) and MODELNAME as a per-model substitution token                                                                             
  - Templates use reasonable defaults matching the helm chart (ports, probes, labels)                                          
  - The SKILL.md documents how to add every optional feature (LMCache, tensor parallelism, LoRA, NIXL, chat templates, init containers, etc.) as incremental YAML modifications rather than Helm            
  conditionals

The skill required just a few iterations. I put it here:

https://github.com/bgrant0607/vllm-skill

I didn’t spend too long on this, so I’m sure it could be improved. I also haven’t tested it much yet. It’s interesting that Claude replaces example strings in commands directly, but used low-level tools like yq to perform substitutions in configuration. I’ll need to experiment with that more.

However, there are already some clear benefits to using Claude skills (or other agent skills) over using Helm to install the application.

First, it’s interactive. Claude asks about whether I want to use specific optional features:

Do you need a request router? (default: yes, if multiple models or production use)
Do you need persistent storage for model weights? (default: no)
Do you need a Hugging Face token for gated models?
Do you need autoscaling (KEDA for engines, HPA for router)?
Do you need monitoring (ServiceMonitor)?
Do you need an Ingress for external access?
Do you need an LMCache server?
Routing strategy: roundrobin (default), session, prefixaware, or kvaware?

Second, it’s high-level. I don’t need to specify every Helm value individually.

Third, both the “template” configuration and the final configuration are both human readable and can be edited by tools, including ConfigHub functions. For instance, this is the PVC configuration:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: vllm-MODELNAME-storage-claim
  namespace: confighubplaceholder
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi

Fourth, the final configuration can be immediately validated by a number of linting, validation, and policy tools. This can be done using Triggers in ConfigHub.

Fifth, I can make subsequent changes using any method I prefer, which might be using the skill, or could just be ConfigHub functions for common Kubernetes configuration changes, like changing resource requests (there’s a set-container-resources function). When I know what Kubernetes resource field I need to change, I don’t need to figure out what the corresponding Helm value is. Configuration can be created using any method you like, and then edits can be made using any method you like. They aren’t coupled. You can use the best tool for the task and situation.

More generally, AI agents are pretty good at generating and editing Kubernetes workload configuration at this point, especially plain YAML manifests and standard resource types like Deployment and Service. Rather than trying to factor out parts of the generated configuration that need to be customized for different uses or environments using Helm, kustomize, cdk8s, or other configuration generation tool, to make the configuration “DRY”, which would make the configuration more complicated, the combination of AI and ConfigHub make it viable to generate and manage many copies (variants) of a configuration.

When configuration is represented and managed as data and can be read and written via API, configuration can be managed the way that other data in databases is managed, thereby avoiding the fundamental challenges with “as code” representations.

Have you used agent skills to customize Kubernetes configuration? Has your experience been positive or negative? Was it reliable enough? How difficult were the skills to use? How difficult were they to create? Do you have tips for how to make such skills more effective? Do you think skills or something else AI-based will replace Helm for installing off-the-shelf Kubernetes applications?

Feel free to reply here, or send me a message on LinkedIn, X/Twitter, or Bluesky, where I plan to crosspost this.

If you found this interesting, you may be interested in other posts in my Kubernetes series.

You are also welcome to try out ConfigHub, which is now in preview.

Converting Helm charts to Agent Skills was originally published in ITNEXT on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Challenge of ConfigMap rollouts in Kubernetes

Brian Grant — Mon, 30 Mar 2026 17:10:59 GMT

http://issues.k8s.io/22368

I was working on ConfigMap updates in ConfigHub, and I bumped into a thorn in Kubernetes that I filed an issue for and tried to solve just over 10 years ago and is still unresolved in the project: ConfigMap rolling updates. What’s the problem, how did it happen, and how can you rollout ConfigMap changes?

(Side note: Notice that that issue number was well above 20,000 in less than two years after we open-sourced Kubernetes. GitHub sent email notifications for every comment on every issue and PR, and there were easily at least 10x as many of those. It was crazy.)

History

Kubernetes workload controllers, such as Deployment, StatefulSet, and DaemonSet, can automatically perform rolling updates across the Pods (groups of containers) owned by the workload controller when the podSpec is modified. This was a key feature of Deployment relative to ReplicationController (the precursor to ReplicaSet), for which rolling update was implemented in kubectl (and in the CLIs and scripts before that). (For more about the history of Kubernetes workload controllers, search for “Topic 4” in my 5-year anniversary history post.)

As I mentioned in my 10-year anniversary Road to 1.0 post, Deployment did not make it into Kubernetes 1.0. Deployment was inspired by an Openshift feature. Work on it was well underway in 1.1, but the automated changes made by both Deployment and HorizontalPodAutoscaler, which was also under development, destabilized the ReplicationController, which up to that point had just been modified by hand or by simple loops in scripts and the CLI. I worked on the ReplicationController/ReplicaSet controller and on the Deployment controller in order to get it ready to ship as beta in Kubernetes 1.2 — back in February 2016.

Releases 1.1 had taken 4-5 months and 1.2 was already taking more than 3 months. We were formalizing and automating the release process at the same time. For instance, Kubernetes 1.2 was the first release to produce a release-specific ChangeLog. There was not yet a formal concept of a release team, but I played the role of the release lead, as well as working on ReplicaSet and Deployment and wearing many other hats on the project, in CNCF, and on the GKE team. It was a busy time.

ConfigMap had also just been added as a non-secret Secret, essentially, to inject application configuration into containers. ConfigMaps could be referenced by Pods, and therefore by Deployments, to be injected as environment variables or volumes. Updating the podSpec of a Deployment would roll out new pods referencing the ConfigMaps specified in the podSpec template, which would update any environment variables or files projecting ConfigMap values, but updating the ConfigMap would…do nothing. At least at the workload controller level. With both Deployment and ConfigMap shipping at the same time, there wasn’t enough time to implement ConfigMap-specific rollout features.

Kubelet could update the ConfigMap volume in place, which could be detected by processes inside containers via inotify if watching. Most applications do not do that and rely on other approaches, such as reload on HUP, but if they did, they’d all update simultaneously unless there were some orchestration to control it. We had wanted to defer in-place rolling updates until we could update more properties, including resource limits, through Docker. Kubelet support also would be needed to restart containers to update environment variables set by ConfigMap values.

I did not want to add a configMapTemplates field to Deployment for rolling out ConfigMaps. ConfigMaps were already just data blobs in Etcd, and didn’t need to be replicated, though they would need to be versioned. Furthermore, adding them to Deployment would bloat that API and cause similar kinds of challenges for tools implementing ConfigMap support as embedding the PodSpec in multiple different resource types did. That type of embedding is what drove adoption of duck typing. I held out hope for referencing a separate PodTemplate API, which did exist at the time, and wanted to treat ConfigMap similarly. But that would introduce a new pattern, and it was hard to build consensus around an approach.

Cascading deletion and garbage collection were also just being implemented around that time, and shipped in release 1.3. The solution proposed was to copy the ConfigMap(s) along with the ReplicaSet and add an ownerReference to the ConfigMap referring to the ReplicaSet. There was a proposal this past December that was similar to that proposal from 2016. That approach could work, and I endorse it. Certainly there would have to be ownerReferences on generated ConfigMaps to ensure they would be cleaned up when the Deployment was deleted.

There were subsequent proposals in 2016, 2017, 2018, and 2019, and perhaps even after that, but none were accepted, unfortunately, partly due to various technical details, but largely due to non-technical factors, such as other priorities. For example, now the focus appears to be on AI workloads. There are also the challenges of stability and backward compatibility, especially now that the project is so widely used and the Deployment controller is such a core, critical component. Making and testing the changes would be a non-negligible amount of work, and maintainers would then be responsible for the code going forward.

The proposals had three main aspects:

Triggering the pod rollout on a ConfigMap data change
Copying ConfigMaps to version them similarly to ReplicaSets
Garbage collecting or pruning old ConfigMap copies

Some of the proposals focused on a subset of these issues, such as triggering or garbage collection. And of course there were numerous other details, such as automatic vs. explicit triggering, making the new behavior opt-in for backward compatibility, dealing with shared ConfigMaps, whether to use the general garbage collector or to do the pruning in the Deployment controller along with ReplicaSet management, impact on garbage-collector load in the case of the former, support in StatefulSet and DaemonSet as well as Deployment, whether to add fields to the podSpec (which was shared between the workload controllers and Pods), whether to add fields to ConfigMap for the benefit of workload controllers, and so on.

Current solutions

Kustomize

As a workaround, we added the ability to Kustomize to generate ConfigMaps with unique names. Because Kustomize already needed to know which resource fields contained references to other resources in order to add name prefixes or suffixes, Kustomize could also update references to generated ConfigMaps. For example:

apiVersion: v1
kind: ConfigMap
metadata:
  name: example-configmap-1-g4hk9g2ff8
...

apiVersion: apps/v1
kind: Deployment
  name: my-app
...
      volumes:
      - configMap:
          name: example-configmap-1-g4hk9g2ff8
        name: config

Because the reference in the configMap.name field of the Deployment’s podSpec would be updated every time the data in the ConfigMap changed, that would trigger a rolling update by the Deployment controller. So kustomize did address (1) and (2), but not (3): keeping around old ConfigMaps that were still referenced, while deleting the ones that were not. So I don’t know how popular this approach is (about 61k search hits for configMapGenerator on GitHub), but only kustomize users would use it.

In general, pruning (i.e., deleting resources removed from the desired state) for kustomize is implemented by other tools, since kustomize doesn’t apply resources to the cluster itself. Pruning in kubectl apply identified applied resources using labels and was considered error prone, and an incorrect specification on the command line could cause resources to be deleted unintentionally.

Today, GitOps tools like ArgoCD and Flux handle pruning. For pruning to work properly with these tools, so that generated ConfigMaps would not be deleted while they were still in use during rolling updates or rollbacks, old revisions of the ConfigMaps would need to remain in the set of resources being applied until they were no longer needed by prior ReplicaSet revisions. I’m not aware of tools that do this kind of working-set management (except ConfigHub, which is discussed below).

Helm

Helm can generate ConfigMap contents of mutable ConfigMaps with stable names:

# Source - https://stackoverflow.com/a/65291822
# Posted by PjoterS
# Retrieved 2026-03-29, License - CC BY-SA 4.0

apiVersion: v1
kind: ConfigMap
metadata:
  name: my-config
data:
  {{- (.Files.Glob "configmap/*").AsConfig | nindent 2 }}

It can’t generate immutable ConfigMaps easily because Helm’s pruning mechanism would prune old revisions immediately when the ConfigMap’s name changed.

Helm has an “official” workaround for the rollout problem, which is to add a hash of a referenced ConfigMap to the podSpec template. This sets a pod annotation, but sometimes an environment variable value of a container is set instead:

kind: Deployment
spec:
  template:
    metadata:
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}

Like the kustomize resource name change, this also causes the podSpec template to be modified when the ConfigMap contents change. The key difference is that there’s only a single ConfigMap revision that is modified in place. This is mechanically simpler, but any pod that is created for any reason once the modified ConfigMap is applied will pick up the new ConfigMap data, and any processes using the inotify approach will pick up the new data immediately, which defeats the purpose of performing a rolling update.

Reloader

A number of tools have been developed by the community to address this gap, as well. A popular one, referenced in issue 22368, is Reloader. Reloader watches for changes and automatically triggers rollouts in workload controllers that are annotated:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  annotations:
    reloader.stakater.com/auto: "true"
spec:
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: app
          image: your-image
          envFrom:
            - configMapRef:
                name: my-config
            - secretRef:
                name: my-secret

Like the Helm method, it also triggers a rolling update by modifying annotations in the podSpec template:

  AnnotationTemplate: `{"spec":{"template":{"metadata":{"annotations":{"%s":"%s"}}}}}`,

While I wouldn’t say that approach is simple, it is simpler than copying ConfigMaps on write, updating references, and garbage collecting them when no longer needed. The ArgoCD project recommends this approach also.

ConfigHub

For ConfigHub, I implemented both approaches: the generation of unique, immutable ConfigMaps and simply updating the podSpec template with a hash of the ConfigMap.

In either case, ConfigHub generates the ConfigMap. Application configuration is stored in separate Units in its native format. Currently supported formats include YAML, JSON, Properties, INI, TOML, Env, and Text.

For example, Properties configuration might look like this:

database.port=5432
database.ssl.enabled=true

Storing the configuration in its native format makes it easier to understand, easier to validate, and easier to keep consistent across environments (e.g., with local development), and enables many standard ConfigHub functions to operate on it, since it is structured key/value data. It also works with ConfigHub’s variant-management mechanisms, for example enabling changes to be automatically merged across variants.

Unique, immutable ConfigMaps

To indicate to ConfigHub what name to use as the prefix of the name of the data key in the generated ConfigMap, a special property needs to be added to the configuration data:

configHub.configName=my-app-config

This property is omitted from the data when generating the ConfigMap. Using an Env file as an example:

apiVersion: v1
kind: ConfigMap
metadata:
  name: my-config-03b362047a
...
immutable: true
data:
  my-app-config.env: |
    DATABASE_PORT=5433
    DATABASE_SSL_ENABLED=false

The reference in the linked Deployment is updated via ConfigHub’s needs/provides value propagation mechanism. This is an example of a volume, but ConfigHub also supports envFrom:

      volumes:
      - name: config-volume
        configMap:
          name: my-config-03b362047a

ConfigHub knows where such reference fields are located in built-in Kubernetes APIs by using the list from kustomize. For non-built-in CustomResourceDefinitions, it’s possible to configure the locations in a couple different ways, but that’s a topic for another post.

By default, ConfigHub will automatically maintain the 10 most recently generated ConfigMaps, because that’s the default revisionHistoryLimit in Kubernetes. That number is configurable via a RevisionHistoryLimit option (see the documentation), which we could derive automatically from the Deployment in the future.

Mutable ConfigMaps

ConfigHub also supports mutable ConfigMaps that have stable names.

apiVersion: v1
kind: ConfigMap
metadata:
  name: my-config

The Deployment references are similarly set automatically.

Additionally, ConfigHub will generate a hash of the ConfigMap and propagate it to an annotation in the podSpec template:

  template:
    metadata:
      labels:
        app: mydep
      annotations:
        confighub.com/Hash: a055c86133

That will trigger re-creation of the Deployment’s pods similarly to the Helm and Reloader solutions.

Wrap up

That hopefully helped you understand the two main alternatives for rolling out ConfigMap changes today:

Generate unique, immutable ConfigMaps and update references to them. Delete old revisions of the ConfigMaps when they are no longer needed. This ensures correct, reliable rolling update behavior, even in cases where pods are re-created for other reasons during the rollout and where processes watch the file using inotify.
Update the podSpec templates of workload controllers, such as Deployments, when the contents of referenced ConfigMaps (and/or Secrets) change, such as by generating and setting a hash value in an annotation. This is simpler and usually works well enough that the approach has been widely adopted.

I wouldn’t hold my breath for this being addressed in upstream Kubernetes at this point, but the feature is feasible. The change would have to be thoroughly tested, but I think it could be isolated to the Deployment controller.

Have you ever stumbled on this quirk of Kubernetes when making a change to a ConfigMap and wondering why your application didn’t observe the change? Did you comment on or +1 a comment on issue 22368? Do you generate ConfigMaps from native application configuration files with kustomize or with helm rather than maintaining them as ConfigMaps manually? Do you use Reloader, or do you use another tool?

Feel free to reply here, or send me a message on LinkedIn, X/Twitter, or Bluesky, where I plan to crosspost this.

You could also try out ConfigHub, which is now in preview.

If you found this interesting, you may be interested in other posts in my Kubernetes series.

The Challenge of ConfigMap rollouts in Kubernetes was originally published in ITNEXT on Medium, where people are continuing the conversation by highlighting and responding to this story.

Vetting Kubernetes configuration with Kyverno prior to deployment

Brian Grant — Wed, 18 Mar 2026 15:42:57 GMT

Kyverno logo

In honor of Kyverno’s graduation in CNCF, I thought I’d write a post about using Kyverno to vet changes to Kubernetes configuration prior to deployment.

First, what is Kyverno? Kyverno is a popular Kubernetes “policy as code” tool. It is integrated with Lens, Devtron, and a number of other tools. I previously mentioned Kyverno in one of my earliest posts, on state-based policy constraints, and in one of my most popular posts, on Kubernetes configuration linting tools.

Kyverno policies can be expressed as Kubernetes custom resources and, therefore, as configuration documents. That means they can be changed and applied like any other Kubernetes resources, such as with GitOps tools like ArgoCD or Flux, and are easy to discover through the Kubernetes API. They don’t have to be pushed to or pulled from another storage location.

These policies can include fairly arbitrary expressions written in CEL to test values of Kubernetes resources, in effect imposing constraints on Kubernetes resources accepted as valid. This allows you to ban configurations that Kubernetes may allow, but that you don’t want to allow in your cluster(s) or at least in some namespaces. This differs from RBAC authorization by constraining WHAT changes are made rather than by constraining WHO made the changes.

For example, here’s a policy that prohibits the use of the :latest tag in Deployments by checking the value of the appropriate field of the resource (slightly simplified compared to the sample):

apiVersion: policies.kyverno.io/v1
kind: ValidatingPolicy
metadata:
  name: disallow-latest-tag
spec:
  validationActions: [Deny]
  matchConstraints:
    resourceRules:
      - apiGroups: ['apps']
        apiVersions: [v1]
        operations: [CREATE, UPDATE]
        resources: [deployments]
  validations:
    - expression: >
        object.spec.template.spec.containers.all(c,
          !c.image.endsWith(':latest')
        )
      message: "Using 'latest' tag is not allowed."

Kyverno has a sizable policy library, and of course you can write your own, as well.

Normally, these policies are enforced by Kubernetes dynamic admission control. However, one then doesn’t find out until applying the configuration whether it passes the policy checks or not. If using ArgoCD or Flux, this failure may show up as a deployment that never successfully fully synchronizes with the cluster. It would be better to find out about such problems earlier and in a more obvious way.

The typical approach to checking the policies in advance would be to render the Helm charts, kustomizations, cdk8s constructs, etc. to plain Kubernetes YAML, and then use the kyverno CLI to apply the policies to the output, generally in CI.

% kyverno apply disallow-latest.yaml --resource=./rendered-resources/

Applying 1 policy rule(s) to 1 resource(s)...
policy disallow-latest-tag -> resource default/Deployment/latest-deployment failed:
1 -  Using 'latest' tag is not allowed.


pass: 0, fail: 1, warn: 0, error: 0, skip: 0

Because the policy specifications are likely owned by a central team, such as a platform team or security team, rather than by the application team, and they are applied to multiple applications, they are typically in a different git repository. Fortunately, the kyverno CLI supports that (private repo support):

% kyverno apply https://github.com/kyverno/policies/tree/main/best-practices-vpol \
    --resource=./rendered-resources/

This works, but there are multiple steps between making a configuration change and finding out that there was a problem with the change, results in CI are buried in log files unless they are pushed somewhere, and for teams that own their own CI pipelines/workflows, the policies are opt-in, and must be added to every repo’s CI pipelines/workflows and kept up to date.

I integrated Kyverno with ConfigHub to demonstrate how Kyverno and similar policy tools can be leveraged to prevent misconfiguration by validating the configuration after every change.

First, a little background on ConfigHub. In ConfigHub, configuration is always fully rendered, or WET — configuration is represented as data. In the case of the Kubernetes Deployment that failed the policy check above, it would look as follows, with no variables, conditionals, loops, or other computation:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: latest-deployment
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - name: test-container
        image: nginx:latest

Configuration is stored in units in ConfigHub rather than as files in git. You could upload the configuration above into a unit with:

% cub unit create latest-deployment latest-deployment.yaml

(There’s a basic tutorial to get started with ConfigHub. If you’re using Helm or kustomize with ArgoCD or Flux, then we can import and render the configuration automatically, but that’s a topic for another post.)

ConfigHub has a capability analogous to Kubernetes dynamic admission webhooks called Triggers. Triggers invoke functions automatically on each configuration change. A function is code that executes on configuration data that ConfigHub can invoke. Some standard functions are built into ConfigHub. You can also write your own functions and build them into workers.

A worker is analogous to a controller in Kubernetes or a runner in CI (though it has specific functionality built in rather than executing arbitrary containers). You run it, typically in your Kubernetes cluster, and it connects back to ConfigHub and performs operations on your behalf.

As an example, I created a custom worker that calls Kyverno’s admission webhook to evaluate policies on configurations in ConfigHub.

To try it, clone our sdk:

% git clone https://github.com/confighub/sdk

The code is in examples/kyverno-server , which also contains a demo script that creates a kind cluster, installs Kyverno and some policies, starts the custom worker, and invokes the function imperatively, as follows:

% cub function do vet-kyverno-server --unit latest-deployment \
    --worker "$KYVERNO_WORKER"

Validation functions return a pass/fail result and details about issues found in the case of failure.

false 0 policy "disallow-latest-tag" rule "validation": Using 'latest' tag is not allowed.
Attributes:
   default/latest-deployment apps/v1/Deployment

To create a trigger using this function, you can do:

% cub trigger create kyverno Mutation Kubernetes/YAML vet-kyverno-server \
    --worker "$KYVERNO_WORKER"

And then when changes are made to the configuration data, the function would automatically be invoked, and if it failed then an apply gate would be attached that would block applying the configuration to the cluster until the issue was addressed.

This approach enables shifting Kubernetes admission control left, to the point where the configuration is modified in storage, prior to deploying changes to the cluster. Any admission controller you have registered with your cluster to be invoked on Kubernetes API calls can also be invoked ahead of time, by your worker, through ConfigHub. Because the worker calls back to ConfigHub, ConfigHub does not need access to your cluster. However, if you’d prefer that the worker didn’t call your admission webhooks, check out the other Kyverno worker example, which executes the Kyverno CLI with specified policies and resources similar to the examples above.

These Kyverno worker functions are just a couple examples of the kind of validation and policy checks that can be enforced centrally using ConfigHub. There’s also a kube-score example, as an example of integrating a configuration scoring tool. For schema validation, kubeconform is integrated as the vet-schemas function. We plan integrations with Kubernetes validating admission policies, Open Policy Agent, and other tools in the future. Additionally, you can use CEL expressions (vet-cel) or Starlark (vet-starlark) to write policy checks without building and running your own worker for that purpose.

Whether you are making a change to a configuration by hand, with yqor another tool, with a ConfigHub function, or with an AI agent, with ConfigHub, not only is the change not obfuscated by layers of configuration generation code, but it can also be immediately automatically vetted, thereby reducing the opportunity for misconfiguration.

How have you tried to prevent common mistakes when people in your organization change Kubernetes configurations? Do you render your configurations and evaluate Kyverno or other policies in PRs before the changes are merged? Is there anything you would like to improve with that process? Do changes ever bypass the policy checks?

Feel free to reply here, or send me a message on LinkedIn, X/Twitter, or Bluesky, where I plan to crosspost this.

You could also try out ConfigHub, which is now in preview.

If you found this interesting, you may be interested in other posts in my Kubernetes series.

Vetting Kubernetes configuration with Kyverno prior to deployment was originally published in ITNEXT on Medium, where people are continuing the conversation by highlighting and responding to this story.

Kubernetes Strategic Merge Patch

Brian Grant — Mon, 09 Mar 2026 20:37:34 GMT

Image generated by Gemini

If you’ve used Kubernetes kubectl apply, server-side apply, or kustomize, then you may have encountered the “strategic merge patch” feature. “Strategic merge patch” is a mouthful. What does it mean? In what sense is it “strategic”? Why does it exist?

I explained this several years ago and copied it into my first blog post, on the Technical History of Kubernetes. It’s also explained in the Kubernetes documentation, in the original design document, on Stackoverflow, and by some others’ blog posts. However, I was working on a new implementation for ConfigHub that could be used for non-Kubernetes configuration as well, and it seemed like a good topic to cover again, because the reasons why strategic merge patch exists may not be obvious.

Strategic merge patch was developed originally by Sam Ghods for use in kubectl apply , but it was necessary as a consequence of several other design decisions.

First, relatively early in the Kubernetes project, a decision had been made that map keys in the API should not have structural significance in the API. The main rationale was the difficulty in distinguishing map keys from fields in the API without a schema. Additionally, even with a JSON or OpenAPI schema, documentation of map keys must be embedded in the description of the whole map.

The original example was the specification of a port’s name using a field:

ports:
  - name: www
    hostPort: 80
    containerPort: 80
    protocol: tcp

Rather than this specification of its name using a map key:

ports:
  www:
    hostPort:80
    containerPort:80
    protocol: tcp

Kubernetes used the former approach for its API, where as configuration tools, such as the precursor to Docker Compose (Fig) chose the latter approach. The issue came up multiple times because common configuration tasks were made more difficult by this decision. Later, cases were added where order matters, however, such as initContainers , though that could have been solved using another field instead.

Second, I wanted the serialized Kubernetes API representation to be used as the basis for configuration formats, rather than developing and maintaining a parallel but slightly different representation for configuration.

Third, I wanted the configuration primitives to be tolerant of fields being set dynamically by automation. Borg’s configuration tool had ad hoc special-case logic to deal with such cases on a one-by-one basis, and the Borg API was modified to handle the most common cases, such as horizontal autoscaling. This was not an extensible approach and new automation components could be blocked until changes could be made to critical core libraries, since otherwise changes made by those components would be detected as configuration drift and would be undone by configuration tools.

That principle also enabled us to explicitly populate default values in the API, so that client tools would not need to guess how the API interpreted unspecified fields or read them from a schema. Also, a hardcoded schema would not work for fields set by admission control, which we felt was important to support organization-specific policies.

Fourth, while admission control and defaulting are synchronous, controllers, such as the horizontal pod autoscaler, operate asynchronously. That meant that if the state were completely overwritten without incorporating dynamically set fields, there would be a window of time where those API fields would be incorrect, and other controllers, such as the Deployment and ReplicaSet controllers, could act on those incorrect values.

The requirement to handle API fields being set asynchronously effectively necessitated merging of the desired state specified by configuration with the state in the API, rather than just replacing it completely. It turned out that merging was a key capability for configuration tools more generally, because declarative configuration creates copies of documents that need to be kept in sync.

Most configuration tools, such as Helm, generate configuration using a unidirectional process. When either the generation code (or templates) or input values change, the whole configuration must be regenerated. Such a generated configuration contains specifications for whole resources, but only statically specified values and does not include dynamically set values.

Other client tools, such as CLIs, can generally update selected fields of resources rather than just replacing whole resources. They typically use patch APIs to do that.

There are two standard JSON patch specifications, JSON Patch and JSON Merge Patch. JSON Patch specifies a sequence of imperative field changes: add, replace, remove, move, copy, test. JSON Merge Patch accepts a document (in this case a resource) using the same structure as the original. For configuration use cases, JSON Merge Patch appeared more convenient than JSON Patch.

Also, on the surface, a merge patch looked like it could be produced without diffing. However, this was not the case. Sometimes fields need to be removed. One example in Kubernetes is a change from one volume source, like emptyDir, to another, such as persistentVolumeClaim. Since they support different options, they are structured in the API as separate fields — a union/oneOf essentially. Identifying such a case automatically requires a diff with the previously specified desired state and the field to be removed must be assigned null in JSON Merge Patch.

To simplify this procedure for clients, I had proposed API support for automatically merging a resource specified by the client with fields stored in the server, by storing multiple sets of values. This proposal was not implemented, but API support for field tracking was eventually added years later in the form of server-side apply. Instead, to implement kubectl apply, we stored the previously specified desired state in an annotation. In fact, that was one of the original motivations for adding annotations to the API.

That solved the problem of diffing, but did not solve the problem of patching. The problem with JSON Merge Patch is its behavior with respect to arrays. From the RFC: “it is not possible to patch part of a target that is not an object, such as to replace just some of the values in an array.” That is a non-starter for the Kubernetes API, which extensively uses arrays, such as for the lists of containers in a pod, lists of ports as shown above, lists of environment variables, and so on.

So we needed to create our own patching mechanism in order to merge fields of array elements. But we also needed to handle the prohibition of the use of map keys. We can use the example of ports again. Let’s say a new port is inserted and another’s value is changed, and we want to merge with values in the API.

The original port:

ports:
  - name: www
    containerPort: 80
    protocol: tcp

And let’s say a hostPort field was added to bind the www port to port 80 via a separate API call:

ports:
  - name: www
    containerPort: 80
    hostPort: 80
    protocol: tcp

And then a user changed the configuration to:

ports:
  - name: observability
    containerPort: 8090
    protocol: tcp
  - name: www
    containerPort: 8080
    protocol: tcp

If just merging elements of the array positionally, index by index, then that would result in this in the API:

ports:
  - name: observability
    containerPort: 8090
    hostPort: 80
    protocol: tcp
  - name: www
    containerPort: 8080
    protocol: tcp

Rather than what was intended:

ports:
  - name: observability
    containerPort: 8090
    protocol: tcp
  - name: www
    containerPort: 8080
    hostPort: 80
    protocol: tcp

This example may seem contrived, but the API has many such arrays, and it seemed likely that something like this would happen.

Essentially such arrays should be treated similarly to maps even though they aren’t structured as maps. I call them associative arrays.

That brings us back to strategic merge patch. “Strategic” refers to the ability to specify a patchStrategyfor arrays, replace or merge. Additionally, in the case of merge, a patchMergeKey can be specified to identify the field of each element to use as the equivalent of a map key so that elements can be matched irrespective of their array positions.

To address the ports example above, the patchStrategy would be merge and the patchMergeKey would be name , to specify that the name field’s value should be used as the key for matching elements. That would enable the intended result to be achieved. However, in actuality, ports do not use the name field for merging, because the name field is optional. Instead, the containerPort field itself is used, along with the protocol, since TCP is not the only supported protocol. This can be seen in the Kubernetes API definition:

 // +optional
 // +patchMergeKey=containerPort
 // +patchStrategy=merge
 // +listType=map
 // +listMapKey=containerPort
 // +listMapKey=protocol
 Ports []ContainerPort `json:"ports,omitempty" patchStrategy:"merge" patchMergeKey:"containerPort" protobuf:"bytes,6,rep,name=ports"`

But the mechanism does achieve the intended result for other cases, such as containers.

Similar behavior can be specified in CustomResourceDefinition using x-kubernetes-list-type map and x-kubernetes-list-map-keys.

We later reused strategic merge patch in Kustomize, to enable patching of what we called “overlays”, which are partial resource specifications. For example (from the documentation), to patch the spec.replicas field:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
spec:
  replicas: 3

Kustomize gets information about array merge strategies and merge keys from OpenAPI documents.

Kustomize’s ability to patch configuration made it useful not just for patching hand-written Kubernetes YAML files, to create variants of the same configuration, but also to patch Kubernetes resources generated by other tools, such as Helm, enabling overrides.

ConfigHub supports managing variants by automatically merging changes from upstream configurations into downstream configurations, or, really, from any configuration into any other configuration, such as downstream to upstream or variant to variant.

As with Kustomize, ConfigHub can also use this mechanism to override values in generated configurations, also known as rendered manifests. Additionally, ConfigHub can merge changes to the live state back into ConfigHub, such as after a “break glass” operational change. This enables ConfigHub to maintain an accurate, up-to-date record of the state and eliminates artificial configuration drift.

Conceptually, the mechanism works similarly to Kustomize, but without the need to write patches by hand. It does that by computing a diff and encoding it in a form that can be used as a patch. To produce the patch correctly, the diff mechanism needs to be aware of array patch strategies and merge keys. Instead of requiring knowledge of the merge keys during both the diff and the patch processes, I chose to encode the merge keys in the patch.

More specifically, I encoded merge keys in resource field paths, as they would be in the case of maps rather than arrays. That was a convenient approach because a number of ConfigHub functions operate similarly in some ways to Kustomize transformers, and operate on specific resource fields. It can be more convenient for users of such functions to locate fields using merge keys, such as environment variable names, rather than array indices. It’s effectively a simple kind of search within the configuration. I’ll write another post in the future that goes into more detail on how these “visitor” functions work.

Anyway, that was a fairly deep dive on the history of strategic merge patch. It was implemented to support the merging of configuration into the live state in Kubernetes, but merging, in general, is a key configuration capability that it facilitates, and one that we support in a general-purpose way in ConfigHub.

Have you wondered why “strategic merge patch” is called that? Have you wondered why it’s needed? Do you wish it weren’t needed? Or have you wondered why other tools don’t support it? Have you experienced surprising or unfortunate merge behavior? Do you wish that arrays of command-line flags could be merged? Do CRDs that you use specify merge keys appropriately?

Feel free to reply here, or send me a message on LinkedIn, X/Twitter, or Bluesky, where I plan to crosspost this.

You could also try out ConfigHub, which is now in preview.

If you found this interesting, you may be interested in other posts in my Kubernetes series.

Kubernetes Strategic Merge Patch was originally published in ITNEXT on Medium, where people are continuing the conversation by highlighting and responding to this story.

Abstraction is the wrong way to simplify configuration

Brian Grant — Tue, 02 Dec 2025 18:05:33 GMT

abstract art image “Fire truck” generated by Gemini

I’ve been asked how abstraction works in ConfigHub.

In the context of configuration, “abstraction” usually refers to encapsulating a set configuration generation artifacts, such as templates, and exposing a defined set of input parameters. Sometimes the artifacts are bundled into a package, such as a Helm chart or Terraform module. The write-only generation process is undirectional and produces the final configuration as output. Configuration changes can only be made through changes to input parameter values, through changes to the generation artifacts, or by using an another mechanism, such as kustomize, to modify the output.

Often the goal of abstraction is to simplify configurations for consumers of the abstraction. Over and over again over the past 15+ years I have seen that approach fail to simplify the configuration as much as was desired and/or expected. It works, but it is not as simple for the consumer as one would hope.

One reason is many users of the abstraction care about some details of the output configuration. Certainly operators may care in emergency situations. A consequence is that users need to reverse engineer how to make the configuration generation process result in the desired output, which may not be obvious from the abstraction. I find I almost always need to read Helm templates to understand how to set input values, for example, and then I need to inspect the output to verify it did what I intended. Adam Jacob has called this the “200% knowledge problem” — you need to understand what you want the output to be (e.g., the values of fields of Kubernetes resources) and the mechanism that generates it.

Another pervasive problem is the tendency for the sizes of abstractions’ interfaces to grow to be comparable in size as or even larger than their output configurations. This is particularly prevalent in off-the-shelf configurations that need to handle many deployment scenarios.

This is a fairly standard flexibility vs. simplicity tradeoff: the more flexible a configuration needs to be, the more complicated its interface becomes. The alternatives to enable more field values to be set — forking the templates, adding more and more parameters over time, parameterizing all fields up front, creating larger and larger patches , or abandoning the abstraction entirely — all result in increased aggregate complexity.

Additionally, efforts to modularize configuration and make it DRY not only make the configuration harder to understand, but also increase blast radius and the complexity of making changes.

Abstraction through encapsulation and parameterization is a case of software engineers trying to use a software-engineering technique to solve what is fundamentally a UX problem.

Users do not adopt Infrastructure/Configuration as Code (IaC) because they think it is simpler to use than other tools. The user experience of writing and changing configuration is more complicated and more manual. Users adopt IaC for the capabilities it enables.

As currently implemented, encapsulating abstractions make simplifying the user experience more difficult. One pragmatic reason is that each such abstraction is a bespoke interface and the cost of providing user assistance is harder to amortize. That’s true even for AI agents, which are not trained on your custom configuration templates and modules.

On the other hand, when we treat configuration as data and build tools targeting well defined schemas, each tool provides views over and actions on the underlying configuration data. We can then apply standard UX techniques to standard APIs to simplify user experience. Through techniques like progressive disclosure, exceptional cases don’t need to pollute the common-case experience.

With this approach, using additional tools also doesn’t necessarily complicate everyone’s ability to understand the configuration. Configuration as data enables stronger separation of concerns. I don’t need to understand every tool that could modify a particular field in a configuration in order to use just one tool. And I don’t need to understand any of the tools in order to understand a field’s value. I can just look at the value, which could be extracted easily by a tool, as in my previous get-image example.

The looser coupling between the configuration data and the implementations of tools that read and write the configuration benefits tool authors similarly, as they don’t need to understand every tool or any tool necessarily in order to write a new tool to interact with particular configuration details. The tools can be changed and even replaced over time without affecting existing configuration data.

If an abstraction was as much for restricting what the user could do as much as simplification, that purpose can be achieved with constraints on the output, which many use via dynamic admission control in Kubernetes already. That approach is also more auditable than implicit and even explicit checks buried inside configuration generation code. So this is not a compelling reason to stick to the monolithic encapsulation approach.

We’re at the early stages of exploring what is feasible to build on top of configuration as data, but it’s already clear that it opens up a whole new world of possible tools we could build. One of the strengths of Kubernetes is the large ecosystem of tools around it. I look forward to the day when the response to someone needing to make a complex configuration change for a particular purpose is “there’s a tool for that.”

Are your configuration abstractions extremely wide and complicated? Do you find that the abstractions erode over time? Do you find yourself reading templates in order to figure out how to use them properly? Do existing configuration tools get in the way of building internal developer platform functionality? If it were easier to build tools to modify and analyze configuration, what tools would you want to build?

Feel free to reply here, or send me a message on LinkedIn, X/Twitter, or Bluesky, where I plan to crosspost this.

You could also try out ConfigHub, which is now in preview.

If you found this interesting, you may be interested in other posts in my Infrastructure as Code and Declarative Configuration series or in my Kubernetes series.

Abstraction is the wrong way to simplify configuration was originally published in ITNEXT on Medium, where people are continuing the conversation by highlighting and responding to this story.

Configuration needs an API

Brian Grant — Mon, 24 Nov 2025 16:49:27 GMT

image generated by Gemini

If you’ve used Kubernetes configuration tools like Helm, cdk8s, or pretty much any other, you probably expect to need to write and edit the configuration by hand. Or probably now you try to use AI, which has improved significantly since last year, but still can’t perform repeated modifications predictably and reliably.

But what if you thought about your configuration as having an API and a schema?

You would build and use tools. You can do that with ConfigHub, which manages configuration as data.

ConfigHub supports a mechanism called functions, which you can write with our SDK, that can be invoked on selected configurations.

Some simple examples:

Need to update an image tag? Invoke set-image-reference
Scale horizontally? set-replicas
Scale vertically? set-container-resources
Change an environment variable? set-env-var
Add a label? set-label
Set the security context to best-practice values?
set-pod-defaults --security-context

I use mutations as an example because they present a greater contrast with “as code” approaches, where editing of templates, patches, and configuration generation code is not amenable to such tooling.

https://medium.com/media/a8f556ffe352dd01a997ae61095c4337/href

Functions and other tools built around the API can be single purpose or multi-purpose, but they do not need to be monolithic in the way that configuration generators do in “as code” approaches. The configuration data is the source of record, so tools can be composed together to construct the final configuration to deploy. This interoperability means that you do not need to reinvent the wheel when you need to modify some additional properties. Just perform additional modifications.

Functions can also be as specific or as generic as you like. A function like set-image-reference targets specific, built-in fields of specific resource types, which simplifies configuration changes, similar to a CLI, and can reduce the opportunity for errors. Other functions, such as set-string-path and yq-i can modify arbitrary fields, but you need to provide the paths to target.

Of course, validation, queries, and value extraction are also enabled by treating configuration as structured data with schemas. Want to know what image is running in each environment? get-image will extract that information for you.

Other APIs are more challenging to maintain in a serialized form, but the Kubernetes API was designed for this. Kubernetes resources, even many custom resources, and cloud resources have schemas that are stable and backward compatible.

Application configuration files may not have formal schemas, but such schemas can be defined. For example, I pointed an AI agent at Grafana’s LDAP configuration documentation and asked it to generate a schema, and it produced a 250-line schema in one shot. A schema specification is not strictly required, but is useful for validation.

The point is that configuration properties are well defined and can be targeted by tools, not just written out by hand over and over again.

As we can see from the rich ecosystem of tools built around Kubernetes, when we build around a common API, more and more useful tools are added to the toolbox over time — forward progress is made. On the other hand, if we keep building new parallel but incompatible ecosystems of configuration generation languages, whether templates, DSLs, or general-purpose languages, we’re reinventing the wheel over and over again. Not only that, but we’re limiting the kinds of tools that can be built around them to just a few kinds of tools due to the complicated syntax and unidirectional process:

Syntax-aware IDEs.
Scaffolding tools — templating templates! These are fire-and-forget configuration generation tools, generally.
Tools that treat the configuration generation code as a black box, such as deployment tools and version-control systems.
Tools that manipulate input variable values.
Tools that operate on the rendered output, such as validation tools and patching tools.
AI-based tools.

I have a background in compilers, but requiring compiler techniques or AI to build most kinds of tools drastically limits what can be built, who can build them, and when it makes sense to build them.

These kinds of tools are also largely anchored in human-driven processes. For example, validation tools find problems but cannot fix them, whereas automatic remediation is feasible with API-based tools.

I look forward to the tools will be developed once we’re not wrapping configuration data with obstructive layers of complexity.

Are there tools that work directly with the Kubernetes API that you wish you could apply to configuration files as well? Would you or members of your organization prefer to use a GUI rather than editing configuration files in git? Do you feel the toil of pushing changes through git to make small configuration changes? Have you ever needed to make similar changes to dozens of configuration files? What tools would you build to modify, inspect, analyze, or validate configuration if it were easier to do?

Feel free to reply here, or send me a message on LinkedIn, X/Twitter, or Bluesky, where I plan to crosspost this.

You could also try out ConfigHub, which is now in preview.

If you found this interesting, you may be interested in other posts in my Infrastructure as Code and Declarative Configuration series or in my Kubernetes series.

Configuration needs an API was originally published in ITNEXT on Medium, where people are continuing the conversation by highlighting and responding to this story.

What is Bidirectional GitOps?

Brian Grant — Mon, 17 Nov 2025 17:18:57 GMT

And why (and when) would anyone want that?

image generated by Gemini

With GitOps as currently defined and implemented, a GitOps Operator will continuously apply the specified “desired” state from storage (git, OCI repositories, etc.).

That’s useful in the case that a single apply would fail whereas an indeterminate number of applies would succeed. Such non-persistent failures could occur due to a number of reasons: unexpressed dependencies, lack of strong consistency, broken network connections, process restarts, etc. A continuous retry loop, essentially.

And when the configuration is updated in storage, the live state should be updated accordingly.

The main reason for the continuous reconciliation loop, however, is continuous drift detection and remediation. That is, if the live state is found to not match the configuration in storage, the live state will be updated to match the state from storage.

Configuration drift happens for a variety of reasons. A common reason is that someone deliberately changed the live state. Why would someone do that when the configuration is specified in storage?

It could be that the person making the change wasn’t aware the configuration was managed in storage, though with Kubernetes configuration this is relatively unlikely, since managing configuration with version control has been a standard practice from the beginning.

It could be that someone was attempting to circumvent policy controls. Kubernetes RBAC and dynamic admission control can be used to prevent that.

It could be an accident, such as a command invoked on the wrong cluster or namespace.

A more common occurrence is that someone made a change out of necessity.

For instance, let’s say something breaks due to an application update, an unexpected operational issue such as a recurring OOM occurs , or an application needs to be updated urgently to address a security concern.

A number of people have told me that they believe it’s not only simpler and faster but even safer to “break glass” and just change the cluster directly.

image generated by Gemini

If they use a GitOps tool, such as FluxCD or ArgoCD, they may need to suspend or disable it first. If they don’t, their change could be clobbered quickly, whatever problem they were trying to address.

I experienced something similar to that many years ago. I was on call. To address a production outage I needed to deploy a fix quickly. The normal process took a long time, too long. So I needed to do it another way. I deployed a new release and it fixed the problem. Seconds later, the previous release was redeployed by a reconciliation loop, taking the service down again. Disabling the reconciliation mechanism would take as long as using it to deploy, so I blocked its network access. Which would eventually be undone by another reconciliation loop.

Anyway, in the case that the operational change is successful and it needs to be made permanent, then it needs to be backported to the configuration source and the reconciliation mechanism needs to be re-enabled or unblocked. Typically, the configuration is in some kind of DRY (Don’t Repeat Yourself) format — templates, patches, etc. — so backporting the change is not trivial, at least not straightforward enough for a traditional deterministic tool to do. The values that need to be changed may be in a template, input variable values, value overrides, patches, configuration generation code, or the like. In the OOM case, we’d need to change container resources, as in this Helm chart excerpt:

        resources:
          {{- toYaml .Values.paymentService.resources | nindent 10 }}

In that case, the appropriate values file would need to be updated. If the values that needed to be changed weren’t parameterized, then the template would need to be changed, which would potentially affect all environments.

Both the configuration tool and the GitOps tool feel like they are getting in the way in this scenario. It should be straightforward to make these kinds of operational changes. You shouldn’t have to fight the automation.

ConfigHub maintains a one-to-one correspondence between the configuration in ConfigHub and the live state. If you do need to update the live state directly, the changes can be pulled back into ConfigHub automatically.

cub unit refresh --space apptique-prod paymentservice

https://medium.com/media/fcab4299a19dbde0d3826cb937a69bf0/href

When it is not desirable to keep the changes, they can be undone, just like any other changes.

cub unit update --patch --restore=-1 --space apptique-prod paymentservice
cub unit apply --space apptique-prod paymentservice

Synchronizing the live state back into ConfigHub provides more visibility into what the current live state is and what changes have been made. I like to see what the actual state is before making changes to the configuration, not just when I am about to apply the changes.

Now imagine continuously synchronizing the live state back into storage. That would be essentially “reverse GitOps”. However, we know when an apply operation is being performed, so we can synchronize in that direction as needed as well — Bidirectional GitOps.

That kind of bidirectional synchronization provides more control over operational changes than the current unidirectional approach, making deliberate operational changes less challenging.

That said, with the 1:1 correspondence and configuration represented as data, this kind of change is also safer (due to minimal blast radius) and much more straightforward to make through ConfigHub, reducing the need to update the live state directly. For common cases, such as changing resources, there are already pre-built functions for making such changes.

cub function invoke --space apptique-prod --unit paymentservice \
    set-container-resources server floor 100m 200Mi 2
cub unit apply --space apptique-prod paymentservice

But you have the option to choose the approach that’s best in the situation when some urgent operational change is necessary.

What do you think? Have you needed to make operational changes and GitOps made the process more complicated and/or take longer? Does bidirectional GitOps sound promising, or confusing? Do you think digital twin is a better term to describe this pattern?

Feel free to reply here, or send me a message on LinkedIn, X/Twitter, or Bluesky, where I plan to crosspost this.

You could also try out ConfigHub, which is now in preview.

If you found this interesting, you may be interested in other posts in my Infrastructure as Code and Declarative Configuration series or in my Kubernetes series.

What is Bidirectional GitOps? was originally published in ITNEXT on Medium, where people are continuing the conversation by highlighting and responding to this story.