Skip to content

Conversation

@nonemax
Copy link

@nonemax nonemax commented May 13, 2021

Checklist:

  • Have you added an explanation of what your changes do and why you'd like them to be included?
  • Have you updated or added documentation for the change, as applicable?
  • Have you tested your changes on all related environments with successful results, as applicable?

Type of Changes:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

What is the current behavior? (link to any open issues here)

What is the new behavior (if this is a feature change)?
This PR adding possibility to set PMM for PostgreSQL for cluster monitoring.
For using you will need:

  1. configure PMM server
  2. apply secret with PMM user password (example is in deploy/pmm-secret.yaml)
  3. add pmm object to pgCluster spec. (example is in deploy/examples/fromcrd.json line 50) and enabled it

Other information:

jmckulk and others added 30 commits December 8, 2020 17:25
As part of the compaction changes some labels and label
checks were changed. This pr reverts these changes.

The `pgo-scheduler` code was updated to check for a
`crunchy-pgbackrest-repo` label instead of the `pgo-backrest-repo`
label. The deployment templates were not updated to use the
updated label so the scheduler would fail to create a backup
job when scheduling a backup.

the pgo.sqlrunner template was updated to have the `sqlrunner`
label instead of `pgo-sqlrunner`
This was originally written for the pgAdmin 4 integration, but
can serve multiple purposes for some of the advanced updating
logic.
A rolling update of a PostgreSQL cluster involves applying any updates
that may require downtime to each replica within a PostgreSQL cluster,
followed by the promotion to a replica deemed suitable to be a primary,
followed by the update being applied to the former primary.

This commit introduces an interface to perform this exact behavior, by
allowing for any updates to the Deployments of PostgreSQL instances to
have any updates applied in a rolling fashion.

Issue: [ch9881]
As this is a change that can cause downtime, it is prudent to try
to limit the downtime by applying a rolling update methodology.
Modifying the annotations on the template portion of a Deployment
Spec causes each Pod that is under management of a Deployment to
be restarted. For managing a database server, this can be less than
ideal.

As such, it is prudent to employ a rolling update strategy for
annotation updates on database instances to minimize downtime on
the primary instance.
The `pgo update cluser --tablespace` functionality now leverages
the rolling update algorithm to minimize the appearance of
downtime to any connected clients.
The `--rolling` flag allows for one to specify a restart on a
PosgreSQL cluster to occur in a rolling fashion, i.e. all the
replicas a restarted, then a switchover occurs, then the newly
demoted primary is restarted.

This subsequently creates a task custom resource to perform the
rolling update, as said updates can take some time to process.

Issue: [ch9881]
The bootstrap Pod, a remnaint of a cluster restore, gets caught
up in the `pgo df` search, but unfortunately this is not a valid
Pod. This exlcude this Pod from being considered.

Issue: [ch2029]
Issue: CrunchyData#2029
When the bootstrap Job completes successfully after a restore,
it contains information that ends up being consumed by other
parts of the Operator system, such as Patroni. As the logs
from the Job do not provide much, if any, helpful information
after a restore succeeds, it's best to have the Operator
eliminate the job.

As such, this changes the behavior so that the bootstrap Job
is removed.

As this has lead to some buggy behavior, this is being
considered as a bug fix, as regular operational work would
dictate that the Job is removed anyway.

Issue: [ch9919]
By adding this limitation, Pods such as Evicted Pods would not
be considered as a part of `pgo test` output, as this could
present some odd scenarios, such as the presence of two primaries.

Issue: [ch9931]
Issue: CrunchyData#2095
This moves several home-constructed methods to using a similarly
constructed one that is maintained upstream. Provides more
consistency across the code that can serve future implementation.
The functionality of the crunchy-backrest-restore container is
now included in the new crunchy-pgbackrest. As such, the existing
reference to the obsolete container can now be removed.
For a variety of reasons, including the need to exec into Pods
to get PVC status with `pgo df`, only running Pods should be
considered for this command and, in particular, no evicted Pods.

Issue: [ch9959]
Issue: CrunchyData#2129
pgBouncer is an optional deployment, as such, we should proceed
on if the pgBouncer is not found.
Updates to a PostgreSQL cluster that warrant a rolling update
are now aggregated to only trigger a single rolling update per
action taken on a PostgreSQL cluster. This allows for the changes
to be rolled out more rapidly, as well as limit the number of
downtime events that need to take place.
This adds a CRD attribute to pgcluster called `exporter`, which
will ultimately allow for the toggling on/off of the metrics
sidecar within a PostgreSQL cluster.

Includes an upgrade path for eliminating confusing labels for
the enablement of the exporter.
The updates the "exporter.json" template, which is used for deploying
the "crunchy-postgres-exporter" sidecar for metrics collection in a
PostgreSQL cluster, to not have a preceiding "," in it. This in turn
allows for the file to be mapped into a Kubernetes Container object,
for convenience of manipulation within a program.
Match labels are immutable objects, and given some potentially mutable
labels exists within the match labels for the PostgreSQL Deployment
objects, it is necessary to modify this set of labels to use the
minimum needed for properly deploying a cluster. This reduces the
current set of match labels for a PostgreSQL instance to the following:

- vendor
- pg-cluster -- the name of the PostgreSQL cluster (group of all
instances)
- deployment-name -- the name of the PostgreSQL instance
- pgo-pg-database
This commit introduces the ability to enable/disable the metrics
collection sidecar (`crunchy-postgres-exporter`) during the lifetime of
a PostgreSQL cluster. This can be toggled in multiple ways, including:

- The `exporter` attribute in pgclusters.crunchydata.com
- `pgo update cluster --enable-metrics`, which adds the sidecar
- `pgo update cluster --disable-metrics`, which removes the sidecar

As adding/removing a sidecar results in modifying a Deployment template,
this action will trigger a rolling update of the PostgreSQL cluster in
an effort to minimize any downtime.

This also has the net effect of moving the "ccp_monitoring" used that is
created to being fully managed by the Postgres Operator. The
`CollectSecretName` attribute is now removed from the pgcluster CRD, as
is the "PgMonitorPassword" attribute from the `pgo-deployer` and other
installers.

Issue: [ch7270]
Issue: CrunchyData#1413
This introduces the `--exporter-rotate-password` flag to
`pgo update cluster` so that the metrics collection password
can be rotated.
Since the primary PVC for the cluster is now retained during an
in-place PostgreSQL cluster restore in support of pgBackRest delta
restores, when preparing a cluster for a restore we can no longer rely
on the deletion of all PVC's as an indicator that the 'config' and
'leader' ConfigMaps created by Patroni can be removed.  Therefore,
the Operator now specifically waits for all Deployments to be
successfully removed prior to deleting these resources.

Issue: [ch9926]
This is a convenience for development, allowing all of the Golang
binaries to be built from a single target.
While these should rarely, if ever, happen, the world of distributed
computing is unpredictable and we should ensure our code can fail
gracefully in these scenarios.
)

There were cases where this was failing due to too many quotes
being used, so this should avoid said issues.

Issue: [ch9981]
Issue: CrunchyData#2108
This moves Grafana to 6.7.5 and Prometheus to 2.23.0. Note that
this continues to use the upstream version.
This adds examples to the monitoring architecture and tutorial
around how to enable metrics collection on an existing PostgreSQL
cluster.
This adds an example for using the Kustomize configuration
management tool for how to manage a custom resource attributed
to pgclusters.crunchydata.com.
The mechanism for disabling the verification prompt for `pgo upgrade`
was always available, but the flag itself was not exposed.

Issue: [ch9988]
Issue: CrunchyData#2135
While the Secret volume mount is set to be readonly for the
pgBackRest Secret information, the defaultMode on the volume itself
was set to be more permissive. While it appears that the vast
majority of Kubernetes distributions gie precedence to the value
of the volume mount, so flavors do use the values set on the volume.

As such, it's prudent to remove the more permissive settings, which
this patch does.

Issue: [ch9989]
Issue: CrunchyData#2140
This introduces the ability to delete pgBackRest backups using
the `pgo delete backup` command. The pgBackRest backup name
must be specified using the `--target` command, which can be
determined through a call to `pgo show backup`.

This also includes obligatory language on when to use explicit
backup deleting to ensure the user does not take actions on
their pgBackRest repository that they do not intend to.

Issue: CrunchyData#2111
Jonathan S. Katz and others added 27 commits April 26, 2021 17:11
This redirects to upstream projects to find out more about the
available metrics.
This makes it easier for PGO Pods to work with other systems.

Issue: CrunchyData#2407
The "secretName" in for the "ssh-config" volume in the
"cluster-bootstrap-job.json" template has been updated to reference the
the proper Secret as needed for restoring across namespaces.
This provides some modernity to help with reporting issues or
requesting features for PGO.
When performing an in-place restore (e.g. using 'pgo restore'), any
existing "bootstrap" Secrets are now deleted.  This facilitates retry
attempts, e.g. after a restore attempt fails, by ensuring all resources
can be properly recreated in order to re-attempt the restore.

Additionally, fixes the spelling of variable "BootstrapConfigPrefix",
and moves it to a location where it can be utilized by both the
"cluster" and "pgbackrest" packages.
These are unavailable on older versions of PostgreSQL.

Issue: [ch11334]
Ensure it is easy to do test builds.
If $GOPATH is unset, this will default to using the standard
module GOPATH.
This ensures the file perms on LICENSE.txt do not change when
script is run.
The apiserver will reconcile its own TLS certificate -- it has been
doing so for awhile. This unifies the method to ensure that only
the apiserver will generate its certificate, unless the user explicitly
provides one.

Issue: [ch11380]
As we are no longer calling openssl from the Ansible scripts, we
do not need to install it explicitly. However, this is included
in most, if not all, of the base images we use.
This includes some of the process debugging utils, vi, less,
and ensuring tzdata is present[1]

[1] https://access.redhat.com/solutions/5616681

Issue: [ch11367]
This label is not referenced anywhere.
The label in question (pgremove) was used to indicate that
the PVC was managed by PGO, but there are other labels that
handle that,
This allows custom labels to be extended to the following
objects that are managed by PGO:

- Pods
- Deployments
- Jobs
- PVCs
- ConfigMaps
- Secrets
- Services

Issue: [ch11329]
This allows for custom labels to be edited on all of the managed
objects within a cluster. This works both from editing "userlabels"
within the pgclusters custom resource, as well as via API calls.

The changes are applied to Postgres instances using a rolling
update.

Issue: [ch11329]
This updates pgMonitor to v4.5-RC3, and makes additional changes
from da556b9.

Issue: [ch11334]
This includes some fixes and changes since 4.7.0-beta.2
Due to some reshuffling, these were not appearing in the
"pgo show cluster" command, but now they are.
Different casing was used in different parts of the documentation for
backrestResources and backrestLimits. This fixes it to be consistent
with what's actually used.
Add `disable_fsgroup: "true"` for the OpenShift 3.11
installer.
* K8SPG-25 Add PMM to operator

* K8SPG-25 Change imagePullPolicy to Alleays

* K8SPG-25 Add PMM secret example

* K8SPG-25 Add resources to PMM, add PMM to pgreplica

* K8SPG-25 Set default pmm.enabled to false

* K8SPG-25 Move PMM to separate template

* K8SPG-25 Add update logic to replica, improve PMM_AGENT_SETUP_NODE_NAME

* K8SPG-25 Add liveness probe and lifecycle preStop

* K8SPG-25 improve AddPMMSidecar

* K8SPG-25 Add separate pmm options to replica

* K8SPG-25 Set imagePullPolicy for deployment to IfNotPresent

* K8SPG-25 Remove pgReplica PMM options

* K8SPG-25 Remove 'Always' pull policy from deployment
@andrewlecuyer
Copy link
Collaborator

This feature was submitted against the PGO v4 release line, which is no longer accepting new feature requests.

If you are still interested in this feature, we recommend that you first submit an issue to discuss your use case further, followed by resubmitting this feature against the current master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.