Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

velero backup of cluster resources results in a crash of velero pod #3539

Open
mfiedi opened this issue Mar 8, 2021 · 1 comment
Open

velero backup of cluster resources results in a crash of velero pod #3539

mfiedi opened this issue Mar 8, 2021 · 1 comment

Comments

@mfiedi
Copy link

mfiedi commented Mar 8, 2021

What steps did you take and what happened:
I'm trying to run a velero backup request for all cluster scope resources in my Openshift Cluster. The command I'm using is:

Soon after the backup command is issued I see a crash of the velero pod in namespace spp-velero. I enabled debugging in the velero deployment and see that the backups hangs when trying to retrieve the events from the cluster. When I query the amount of event, I see that there are 214558. That's an incredibly high number and honestly I've no idea where the come from. Here is a snippet from the velero log right before the restart:

time="2021-03-08T10:01:47Z" level=info msg="Getting items for resource" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:165" resource=nodes
time="2021-03-08T10:01:47Z" level=info msg="Listing items" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:291" namespace= resource=nodes
time="2021-03-08T10:01:47Z" level=info msg="Retrieved 6 items" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:297" namespace= resource=nodes
time="2021-03-08T10:01:47Z" level=info msg="Getting items for resource" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:165" resource=podtemplates
time="2021-03-08T10:01:47Z" level=info msg="Listing items" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:291" namespace= resource=podtemplates
time="2021-03-08T10:01:47Z" level=info msg="Retrieved 0 items" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:297" namespace= resource=podtemplates
time="2021-03-08T10:01:47Z" level=info msg="Getting items for resource" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:165" resource=services
time="2021-03-08T10:01:47Z" level=info msg="Listing items" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:291" namespace= resource=services
time="2021-03-08T10:01:48Z" level=info msg="Retrieved 77 items" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:297" namespace= resource=services
time="2021-03-08T10:01:48Z" level=info msg="Getting items for resource" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:165" resource=replicationcontrollers
time="2021-03-08T10:01:48Z" level=info msg="Listing items" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:291" namespace= resource=replicationcontrollers
time="2021-03-08T10:01:48Z" level=info msg="Retrieved 0 items" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:297" namespace= resource=replicationcontrollers
time="2021-03-08T10:01:48Z" level=info msg="Getting items for resource" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:165" resource=limitranges
time="2021-03-08T10:01:48Z" level=info msg="Listing items" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:291" namespace= resource=limitranges
time="2021-03-08T10:01:48Z" level=info msg="Retrieved 0 items" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:297" namespace= resource=limitranges
time="2021-03-08T10:01:48Z" level=info msg="Getting items for resource" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:165" resource=secrets
time="2021-03-08T10:01:48Z" level=info msg="Listing items" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:291" namespace= resource=secrets
time="2021-03-08T10:01:50Z" level=info msg="Retrieved 1307 items" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:297" namespace= resource=secrets
time="2021-03-08T10:01:50Z" level=info msg="Getting items for resource" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:165" resource=events
time="2021-03-08T10:01:50Z" level=info msg="Listing items" backup=spp-velero/test-mfiedler-debug group=v1 logSource="pkg/backup/item_collector.go:291" namespace= resource=events

In an attempt to solve this I raised the memory settings for the velero deployment to these settings:

velero_resource_allocation:
limits:
cpu: '1'
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi

That did not solve the issue. The velero pods still crashes, the backup remains in progress s forever and never comes to an end.

I now deleted all events and restarted the backup. Now it works fine. Questions remains why the backup crashes with many existing events?

What did you expect to happen:
Backup completes successfully

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero

  • refer to attached log

  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml

velero describe backup sppbackup-k8s-b7fb5ea4-17f2-4c3c-980e-a74f1b654a0b -n spp-velero
Name: sppbackup-k8s-b7fb5ea4-17f2-4c3c-980e-a74f1b654a0b
Namespace: spp-velero
Labels: velero.io/storage-location=default
Annotations: velero.io/source-cluster-k8s-gitversion=v1.18.3+fa69cae
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=18+

Phase: InProgress

Errors: 0
Warnings: 0

Namespaces:
Included:
Excluded:

Resources:
Included: *
Excluded:
Cluster-scoped: included

Label selector:

Storage Location: default

Velero-Native Snapshot PVs: auto

TTL: 61320h0m0s

Hooks:

Backup Format Version: 1.1.0

Started: 2021-03-08 12:11:49 +0100 CET
Completed: <n/a>

Expiration: 2028-03-06 12:11:49 +0100 CET

Velero-Native Snapshots:

  • velero backup logs <backupname>

velero backup logs sppbackup-k8s-b7fb5ea4-17f2-4c3c-980e-a74f1b654a0b -n spp-velero
Logs for backup "sppbackup-k8s-b7fb5ea4-17f2-4c3c-980e-a74f1b654a0b" are not available until it's finished processing. Please wait until the backup has a phase of Completed or Failed and try again.

  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Velero version (use velero version):
    Client:
    Version: v1.5.2
    Git commit: e115e5a
    Server:
    Version: v1.5.2-konveyor

  • Velero features (use velero client config get features):
    features:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2-0-g52c56ce", GitCommit:"b66f2d3a6893be729f1b8660309a59c6e0b69196", GitTreeState:"clean", BuildDate:"2020-08-10T04:49:09Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.3+fa69cae", GitCommit:"fa69cae", GitTreeState:"clean", BuildDate:"2020-12-14T23:03:06Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes installer & version:

  • Cloud provider or hardware configuration:

  • OS (e.g. from /etc/os-release):
    NAME="Red Hat Enterprise Linux Server"
    VERSION="7.9 (Maipo)"
    ID="rhel"
    ID_LIKE="fedora"
    VARIANT="Server"
    VARIANT_ID="server"
    VERSION_ID="7.9"
    PRETTY_NAME="Red Hat Enterprise Linux Server 7.9 (Maipo)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:redhat:enterprise_linux:7.9:GA:server"
    HOME_URL="https://www.redhat.com/"
    BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.9
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.9"

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@mfiedi
Copy link
Author

mfiedi commented Mar 8, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants