Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Velero server panic if plugin print log with "WithError" #5972

Closed
Lyndon-Li opened this issue Mar 9, 2023 · 6 comments
Closed

Velero server panic if plugin print log with "WithError" #5972

Lyndon-Li opened this issue Mar 9, 2023 · 6 comments
Assignees
Labels
Area/Plugins Issues related to plugin infra/internal plugins Bug target/1.10.3
Milestone

Comments

@Lyndon-Li
Copy link
Contributor

As the logrus code, the error field will be converted to a string if the log format is json:

func (f *JSONFormatter) Format(entry *Entry) ([]byte, error) {
	data := make(Fields, len(entry.Data)+4)
	for k, v := range entry.Data {
		switch v := v.(type) {
		case error:
			// Otherwise errors are ignored by `encoding/json`
			// https://github.com/sirupsen/logrus/issues/137
			data[k] = v.Error()
		default:
			data[k] = v
		}
	}

When the plugin generate log with logger.WithError, the plugin client gets the error field as string instead of error interface.

The backup controller registers a log_counter_hook to the backup logger, while in log_counter_hook.go, the assumes the error field as error interface only:

	if isErrorFieldPresent {
		entryMessage = fmt.Sprintf("%s error: /%s", entryMessage, errorField.(error).Error())
	}

As a result, there will be a panic like below:

panic: interface conversion: string is not error: missing method Error

goroutine 1079 [running]:
github.com/vmware-tanzu/velero/pkg/util/logging.(*LogHook).Fire(0xc000571230, 0xc0003f2620)
	/go/src/github.com/vmware-tanzu/velero/pkg/util/logging/log_counter_hook.go:74 +0x3f3
github.com/sirupsen/logrus.LevelHooks.Fire(0xc000a53af0?, 0xa53ac0?, 0xc000000005?)
	/go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/hooks.go:28 +0x82
github.com/sirupsen/logrus.(*Entry).fireHooks(0xc0003f2620)
	/go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:272 +0x1dc
github.com/sirupsen/logrus.(*Entry).log(0xc0003f2540, 0x2, {0xc000b256c0, 0xe})
	/go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:241 +0x34f
github.com/sirupsen/logrus.(*Entry).Log(0xc0003f2540, 0x2, {0xc000a53d98?, 0xc000a53da8?, 0xc000a53da8?})
	/go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:293 +0x4f
github.com/sirupsen/logrus.(*Entry).Error(...)
	/go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:322
github.com/vmware-tanzu/velero/pkg/plugin/clientmgmt/process.(*logrusAdapter).Error(0xc0004ca420, {0xc000b25630, 0xe}, {0xc000ac0e00?, 0x1?, 0xc00068eea0?})
	/go/src/github.com/vmware-tanzu/velero/pkg/plugin/clientmgmt/process/logrus_adapter.go:90 +0x9c
github.com/hashicorp/go-plugin.(*Client).logStderr(0xc000b656b0, {0x2a5f600?, 0xc000481308?})
	/go/pkg/mod/github.com/hashicorp/go-plugin@v1.4.3/client.go:1039 +0xb8a
created by github.com/hashicorp/go-plugin.(*Client).Start
	/go/pkg/mod/github.com/hashicorp/go-plugin@v1.4.3/client.go:611 +0x144c
@Lyndon-Li Lyndon-Li self-assigned this Mar 15, 2023
@Lyndon-Li Lyndon-Li added Bug Area/Plugins Issues related to plugin infra/internal plugins labels Mar 15, 2023
@lnehrin
Copy link

lnehrin commented Mar 16, 2023

Yeah, this is killer. Trying to run Velero v1.10.2 with AWS plugin v1.6.1 and I cannot get past this. Makes Velero unusable.

@Lyndon-Li
Copy link
Contributor Author

@lnehrin We expect this problem won't happen in the normal case because we have checked the plugins and found plugins seldomly print log with WithError. So as the initial discussion, we prepare to fix in the next release, 1.12.

If you face the problem in v1.10.2, this is not the case, we will not only need to fix this in 1.11 but also need to fix it in v1.10.x.

So please share us the Velero log bundle when the problem happened in your env, let's check which log caused the problem and redecide.

@lnehrin
Copy link

lnehrin commented Mar 17, 2023

velero.log

I tried various installations of velero, with and without CSI, and even no plugins at all. I thought I could migrate between EKS clusters, exactly as the example in the docs. Maybe my source cluster EKS v1.21.14 is just too unhappy. When I do an identical installation on a new EKS cluster v1.24.8 (no CSI) - a backup succeeds. Unfortunately my use case is to migrate namespaces from the old cluster v1.21.14 to the new cluster v1.24.8. I think I'm going to be stuck with manually migrating the PVs and fresh helm installations of what I need to get installed.

@peixian
Copy link

peixian commented Mar 23, 2023

I'm also seeing this in v1.10.2, log attached.
velero_out.log

@Lyndon-Li Lyndon-Li added this to the v1.11 milestone Mar 24, 2023
@Lyndon-Li
Copy link
Contributor Author

Thank all, we will fix this in v1.11 and v1.10.3

@Lyndon-Li
Copy link
Contributor Author

Close as completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Plugins Issues related to plugin infra/internal plugins Bug target/1.10.3
Projects
None yet
Development

No branches or pull requests

3 participants