Skip to content

velero pod restarts on backup of cluster resources #68

@shawn-hurley

Description

@shawn-hurley

Tracking issue filed here: velero-io/velero#3055

we used the AWS plugin provider with minio as the backuplocation
simply try to backup the cluster resources:
velero create backup mybackup-1 -n spp-velero
The backup remains in Status "InProgress" and never reaches Complete state.
When monitoring the pod we see that the pod is restarted during the backup, this might cause the backup to never reach Completion state.
What did you expect to happen:
The velero backup of cluster resources should end with Complete state.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

kubectl logs deployment/velero -n velero
oc logs -p deployment/velero -n spp-velero
time="2020-11-05T18:44:10Z" level=info msg="Setting up backup log" backup=spp-velero/mybackup-1 controller=backup logSource="pkg/controller/backup_controller.go:512"
time="2020-11-05T18:44:10Z" level=info msg="Setting up backup temp file" backup=spp-velero/mybackup-1 logSource="pkg/controller/backup_controller.go:534"
time="2020-11-05T18:44:10Z" level=info msg="Setting up plugin manager" backup=spp-velero/mybackup-1 logSource="pkg/controller/backup_controller.go:541"
time="2020-11-05T18:44:10Z" level=info msg="Getting backup item actions" backup=spp-velero/mybackup-1 logSource="pkg/controller/backup_controller.go:545"
time="2020-11-05T18:44:10Z" level=info msg="Setting up backup store to check for backup existence" backup=spp-velero/mybackup-1 logSource="pkg/controller/backup_controller.go:551"
time="2020-11-05T18:44:10Z" level=info msg="Writing backup version file" backup=spp-velero/mybackup-1 logSource="pkg/backup/backup.go:236"
time="2020-11-05T18:44:10Z" level=info msg="Including namespaces: *" backup=spp-velero/mybackup-1 logSource="pkg/backup/backup.go:242"
time="2020-11-05T18:44:10Z" level=info msg="Excluding namespaces: " backup=spp-velero/mybackup-1 logSource="pkg/backup/backup.go:243"
time="2020-11-05T18:44:10Z" level=info msg="Including resources: *" backup=spp-velero/mybackup-1 logSource="pkg/backup/backup.go:246"
time="2020-11-05T18:44:10Z" level=info msg="Excluding resources: " backup=spp-velero/mybackup-1 logSource="pkg/backup/backup.go:247"
time="2020-11-05T18:44:10Z" level=info msg="Backing up all pod volumes using restic: false" backup=spp-velero/mybackup-1 logSource="pkg/backup/backup.go:248"
time="2020-11-05T18:44:23Z" level=info msg="Getting items for group" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:76"
time="2020-11-05T18:44:23Z" level=info msg="Getting items for resource" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:165" resource=pods
time="2020-11-05T18:44:23Z" level=info msg="Listing items" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:291" namespace= resource=pods
time="2020-11-05T18:44:24Z" level=info msg="Retrieved 215 items" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:297" namespace= resource=pods
time="2020-11-05T18:44:24Z" level=info msg="Getting items for resource" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:165" resource=persistentvolumeclaims
time="2020-11-05T18:44:24Z" level=info msg="Listing items" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:291" namespace= resource=persistentvolumeclaims
time="2020-11-05T18:44:24Z" level=info msg="Retrieved 3 items" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:297" namespace= resource=persistentvolumeclaims
time="2020-11-05T18:44:24Z" level=info msg="Getting items for resource" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:165" resource=persistentvolumes
time="2020-11-05T18:44:24Z" level=info msg="Listing items" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:291" namespace= resource=persistentvolumes
time="2020-11-05T18:44:24Z" level=info msg="Retrieved 3 items" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:297" namespace= resource=persistentvolumes
time="2020-11-05T18:44:24Z" level=info msg="Getting items for resource" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:165" resource=namespaces
time="2020-11-05T18:44:24Z" level=info msg="Listing items" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:291" namespace= resource=namespaces
time="2020-11-05T18:44:24Z" level=info msg="Retrieved 64 items" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:297" namespace= resource=namespaces
time="2020-11-05T18:44:24Z" level=info msg="Getting items for resource" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:165" resource=events
time="2020-11-05T18:44:24Z" level=info msg="Listing items" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:291" namespace= resource=events
time="2020-11-05T18:44:24Z" level=info msg="Retrieved 2148 items" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:297" namespace= resource=events
time="2020-11-05T18:44:25Z" level=info msg="Getting items for resource" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:165" resource=secrets
time="2020-11-05T18:44:25Z" level=info msg="Listing items" backup=spp-velero/mybackup-1 group=v1 logSource="pkg/backup/item_collector.go:291" namespace= resource=secrets
velero backup describe or kubectl get backup/ -n velero -o yaml
velero backup get -n spp-velero
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
mybackup-1 InProgress 0 0 2020-11-05 19:44:10 +0100 CET 29d default
[root@nevada16 install]# velero backup describe mybackup-1 -n spp-velero
Name: mybackup-1
Namespace: spp-velero
Labels: velero.io/storage-location=default
Annotations: velero.io/source-cluster-k8s-gitversion=v1.18.3+2fbd7c7
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=18+
Phase: InProgress

Errors: 0
Warnings: 0

Namespaces:
Included: *
Excluded:

Resources:
Included: *
Excluded:
Cluster-scoped: auto

Label selector:

Storage Location: default

Velero-Native Snapshot PVs: auto

TTL: 720h0m0s

Hooks:

Backup Format Version: 1.1.0

Started: 2020-11-05 19:44:10 +0100 CET
Completed: <n/a>

Expiration: 2020-12-05 19:44:10 +0100 CET

Velero-Native Snapshots:

velero backup logs
velero backup logs mybackup-1 -n spp-velero
Logs for backup "mybackup-1" are not available until it's finished processing. Please wait until the backup has a phase of Completed or Failed and try again.

velero restore describe or kubectl get restore/ -n velero -o yaml

velero restore logs

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
The pod is always restarting after or on retrieving the resource secrets. This is reproducable.
A backup with exclude secrets works fine.
velero create backup mybackup-2 --exclude-resources secrets -n spp-velero
velero get backup -n spp-velero
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
mybackup-1 InProgress 0 0 2020-11-05 19:44:10 +0100 CET 29d default
mybackup-2 Completed 0 1 2020-11-05 19:48:35 +0100 CET 29d default

Environment:
OCP 4.5.15, we see the same behavior on 4.5.6 and 4.6.1

Velero version (use velero version): 1.5.2 and 1.4.3

Velero features (use velero client config get features):

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2-0-g52c56ce", GitCommit:"b66f2d3a6893be729f1b8660309a59c6e0b69196", GitTreeState:"clean", BuildDate:"2020-08-10T04:49:09Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.3+2fbd7c7", GitCommit:"2fbd7c7", GitTreeState:"clean", BuildDate:"2020-10-09T11:41:17Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes installer & version:

Cloud provider or hardware configuration:

OS (e.g. from /etc/os-release): CoreOS 4.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions