Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cacert integration #2675

Closed
reddiablo85 opened this issue Jun 30, 2020 · 4 comments
Closed

cacert integration #2675

reddiablo85 opened this issue Jun 30, 2020 · 4 comments

Comments

@reddiablo85
Copy link

reddiablo85 commented Jun 30, 2020

We have installed the new version of Velero (V1.4.0) and tested our backups & restores to our internal s3 provider over http. No issues to report.
We decided to reinstall to start using https. The first issue we encountered was that the helm chart doesn´t seem to include a configurable parameter to point the velero workload to an internal cacert. To get around this we installed the cacert manually with only access to the velero namespace. It still failed with the following in the velero logs:

time="2020-06-30T07:28:39Z" level=info msg="Checking that all backup storage locations are valid" logSource="pkg/cmd/server/server.go:437"
An error occurred: some backup storage locations are invalid: backup store for location "default" is invalid: rpc error: code = Unknown desc = RequestError: send request failed
caused by: Get https://s3.er.abc.com:10443/velero?delimiter=%2F&list-type=2&prefix=abc-lab09%2F: x509: certificate signed by unknown authority

We removed the installation and installed manually using the CLI and placed the certs locally on the machine we were launching from, see install commands below:

velero install
--plugins dtr01.er.abc.com:4002/velero/velero-plugin-for-aws:v1.1.0
--provider aws
--use-restic
--image dtr01.er.abc.com:4002/velero/velero:v1.4.0
--use-volume-snapshots=false
--backup-location-config region="us-east-1",s3ForcePathStyle="true",s3Url="https://s3.er.abc.com:10443"
--cacert /root/rancherlab/certs/abc_root_bundle.crt
--secret-file /root/rancherlab/Velero/crds
--bucket velero
--prefix abc-lab09

This failed with the same logs as above.
We have tried a variety of differnt cert types (pem, crt, with/without intermediate etc) but they alll return the same issue. The pod for velero stays in a crashbootloop and doesn´t deploy.

What did you expect to happen:
Backups and restores functioning over https

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero: full logs below:
    kubectl logs deployment/velero -n velero
    time="2020-06-30T07:55:10Z" level=info msg="setting log-level to INFO" logSource="pkg/cmd/server/server.go:177"
    time="2020-06-30T07:55:10Z" level=info msg="Starting Velero server v1.4.0 (5963650)" logSource="pkg/cmd/server/server.go:179"
    time="2020-06-30T07:55:10Z" level=info msg="1 feature flags enabled []" logSource="pkg/cmd/server/server.go:181"
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=BackupItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/crd-remap-version
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=BackupItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/pod
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=BackupItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/pv
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=BackupItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/service-account
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/add-pv-from-pvc
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/add-pvc-from-pod
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/change-pvc-node-selector
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/change-storage-class
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/cluster-role-bindings
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/crd-preserve-fields
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/job
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/pod
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/restic
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/role-bindings
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/service
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/service-account
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-aws kind=VolumeSnapshotter logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/aws
    time="2020-06-30T07:55:10Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-aws kind=ObjectStore logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/aws
    time="2020-06-30T07:55:10Z" level=info msg="Checking existence of namespace" logSource="pkg/cmd/server/server.go:361" namespace=velero
    time="2020-06-30T07:55:10Z" level=info msg="Namespace exists" logSource="pkg/cmd/server/server.go:367" namespace=velero
    time="2020-06-30T07:55:12Z" level=info msg="Checking existence of Velero custom resource definitions" logSource="pkg/cmd/server/server.go:396"
    time="2020-06-30T07:55:12Z" level=info msg="All Velero custom resource definitions exist" logSource="pkg/cmd/server/server.go:430"
    time="2020-06-30T07:55:12Z" level=info msg="Checking that all backup storage locations are valid" logSource="pkg/cmd/server/server.go:437"
    An error occurred: some backup storage locations are invalid: backup store for location "default" is invalid: rpc error: code = Unknown desc = RequestError: send request failed
    caused by: Get https://s3.er.abc.com:10443/velero?delimiter=%2F&list-type=2&prefix=abc-lab09%2F: x509: certificate signed by unknown authority

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

  • We can navigate the s3 storage (using s3 browser) via https using the same certificates, endpoint and credentials. As such we don´t believe it is a cert issue on the storage side

Environment:

  • Velero version: v1.4.0
  • Velero features: with Restic & AWS plugin
  • Kubernetes version (use kubectl version): 1.16.8
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration: NetApp StorageGRID
  • OS (e.g. from /etc/os-release): RHEL 7

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@reddiablo85
Copy link
Author

so after further testing we were able to mount a pvc into the velero deployment and store the certificate in there
following that we were able to add the variable AWS_CA_BUNDLE to our deployment (as per this comment on another issue #1027 (comment))

now we are working on building all of this into the helm chart to ease the deployment

@reddiablo85
Copy link
Author

reddiablo85 commented Jul 1, 2020

Have to reopen this issue. As mentioned previously we have solved the Velero side of things by doing the following:

  • Adding a secret with the company root cert to our cluster and pointing the velero pod to it by adding an environment variable: AWS_CA_BUNDLE={PATH to CACERT}

This allows us visibility from the velero client to our s3 bucket over https. However the cert is not passed to the restic pods it seems and "velero restic repos get" consistently shows the pods in a NotReady status. Digging deeper shows an x509 cert invalid error.

Steps we´ve taken to try and rectify this:

  • Add the same secret and AWS_CA_BUNDLE as the velero workload (doesn´t work)
  • Add the certificate manually to the restic pods to /etc/ssl/certs and "update-ca-certificates"
    This does allow us to do a restic init on ths s3 bucket from the restic pod but doesn´t seem to transmit further and the velero restic repo get stays in a NotReady status with the same cert error

How can we get restic to recognise the certificate?

See below result of velero restice repo get {REPO} -o yaml

[root@labcpadm01t 1.1.0]# velero restic repo get lab-test01-default-dml68 -o yaml
apiVersion: velero.io/v1
kind: ResticRepository
metadata:
creationTimestamp: "2020-07-01T13:53:02Z"
generateName: lab-test01-default-
generation: 3
labels:
velero.io/storage-location: default
velero.io/volume-namespace: lab-test01
name: lab-test01-default-dml68
namespace: velero
resourceVersion: "20183"
selfLink: /apis/velero.io/v1/namespaces/velero/resticrepositories/lab-test01-default-dml68
uid: dc427d7e-0fab-457e-9334-c1a5f228ca2b
spec:
backupStorageLocation: default
maintenanceFrequency: 168h0m0s
resticIdentifier: s3:https://s3.er.abc.com:10443/velero/gdc-lab299/restic/lab-test01
volumeNamespace: lab-test01
status:
message: |-
error running command=restic init --repo=s3:https://s3.er.abc.com:10443/velero/gdc-lab299/restic/lab-test01 --password-file=/tmp/velero-restic-credentials-lab-test01011975119 --cache-dir=/scratch/.cache/restic, stdout=, stderr=Fatal: create repository at s3:https://s3.er.abc.com:10443/velero/gdc-lab299/restic/lab-test01 failed: client.BucketExists: Get https://s3.er.abc.com:10443/velero/?location=: x509: certificate signed by unknown authority

: exit status 1

phase: NotReady

@airmonitor
Copy link

Hi.

Please try recent version 1.4.2

velero install
--image velero/velero-arm64:v1.4.2
--provider aws
--plugins velero/velero-plugin-for-aws:master
--bucket $BUCKET
--backup-location-config region=$REGION
--snapshot-location-config region=$REGION
--secret-file ~/.aws/credentials-velero
--cacert s3-eu-central-1-amazonaws-com.pem \

@reddiablo85
Copy link
Author

Hi @airmonitor
That seems to have solved the issue. Thanks very much.
I have discovered another problem when trying to build in the "--default-volumes-to-restic" flag into the velero install command but I´ll open a separate issue for it if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants