Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigating an error when import.ova occurs #1972

Closed
jtarchie opened this issue May 6, 2020 · 17 comments
Closed

investigating an error when import.ova occurs #1972

jtarchie opened this issue May 6, 2020 · 17 comments

Comments

@jtarchie
Copy link

jtarchie commented May 6, 2020

We have a job that runs a govc import.ova periodically.
We are experiencing a bug (20%) with the upload.

$ govc import.ova -options=/tmp/options.json737929222 image/ops-manager-vsphere-2.9.1-build.121.ova

[06-05-20 11:44:53] Uploading pivotal-ops-manager-disk1.vmdk... OK
panic: send on closed channel

goroutine 28 [running]:
github.com/vmware/govmomi/vim25/progress.(*reader).Read(0xc0003f6680, 0xc00054c000, 0x2000, 0x2000, 0xc0001b9b18, 0x408b9b, 0xc00001e000)
	/home/vmware/src/github.com/vmware/govmomi/vim25/progress/reader.go:116 +0x24c
io/ioutil.devNull.ReadFrom(0x0, 0x1431220, 0xc0003f6680, 0x101e7a0, 0x203001, 0x7f7e9e5de940)
	/home/dougm/golang-go/src/io/ioutil/ioutil.go:147 +0x92
io.copyBuffer(0x1433300, 0x1e86af8, 0x1431220, 0xc0003f6680, 0x0, 0x0, 0x0, 0x158946800, 0x0, 0x0)
	/home/dougm/golang-go/src/io/io.go:388 +0x2ed
io.Copy(...)
	/home/dougm/golang-go/src/io/io.go:364
net/http.(*transferWriter).doBodyCopy(0xc000096f00, 0x1433300, 0x1e86af8, 0x1431220, 0xc0003f6680, 0x158946800, 0x0, 0x0)
	/home/dougm/golang-go/src/net/http/transfer.go:400 +0x6a
net/http.(*transferWriter).writeBody(0xc000096f00, 0x142d940, 0xc00040b700, 0x2, 0x2)
	/home/dougm/golang-go/src/net/http/transfer.go:364 +0x733
net/http.(*Request).write(0xc0002d0e00, 0x142d940, 0xc00040b700, 0x0, 0xc00049a9f0, 0x0, 0x0, 0x0)
	/home/dougm/golang-go/src/net/http/request.go:682 +0x6d3
net/http.(*persistConn).writeLoop(0xc00009fd40)
	/home/dougm/golang-go/src/net/http/transport.go:2207 +0x1c8
created by net/http.(*Transport).dialConn
	/home/dougm/golang-go/src/net/http/transport.go:1575 +0xb23

The upload is successful, the import.ova is complete.
I'm trying to capture this for SEO and to start our investigation.

@dougm
Copy link
Member

dougm commented May 6, 2020

Can you run with GOTRACEBACK=all (https://golang.org/pkg/runtime/) so we can see what the other go routines are doing?

Does this happen only when run by the job or are you seeing it when run manually from a terminal too?

We should probably have a way to disable the progress logger regardless of this issue, which could also be a workaround here if the problem/fix isn't clear.

@jtarchie
Copy link
Author

jtarchie commented May 6, 2020

We are going to add the environment variable GOTRACEBACK=all to our CI job.
It routinely comes up every day or two, so it should be easier to reproduced.

As for a possible fix, without understanding what's going on, is to wrap the channel write operation in an if statement to ensure the channel can be written to before attempting.

I'd like to figure out, though, why a channel is being written when the system implied it is Done.

@jtarchie
Copy link
Author

jtarchie commented May 8, 2020

We got the following stack trace:

$ govc import.ova -options=/tmp/options.json340456251 image/ops-manager-vsphere-2.9.1-build.121.ova
[07-05-20 22:30:00] Uploading pivotal-ops-manager-disk1.vmdk... OK
panic: send on closed channel

goroutine 33 [running]:
github.com/vmware/govmomi/vim25/progress.(*reader).Read(0xc0003b34c0, 0xc000236000, 0x2000, 0x2000, 0xc00040db18, 0x408b9b, 0xc00001e000)
	/home/vmware/src/github.com/vmware/govmomi/vim25/progress/reader.go:116 +0x24c
io/ioutil.devNull.ReadFrom(0x0, 0x1431220, 0xc0003b34c0, 0x101e7a0, 0x203001, 0x7fd2b23154a8)
	/home/dougm/golang-go/src/io/ioutil/ioutil.go:147 +0x92
io.copyBuffer(0x1433300, 0x1e86af8, 0x1431220, 0xc0003b34c0, 0x0, 0x0, 0x0, 0x158946800, 0x0, 0x0)
	/home/dougm/golang-go/src/io/io.go:388 +0x2ed
io.Copy(...)
	/home/dougm/golang-go/src/io/io.go:364
net/http.(*transferWriter).doBodyCopy(0xc000097900, 0x1433300, 0x1e86af8, 0x1431220, 0xc0003b34c0, 0x158946800, 0x0, 0x0)
	/home/dougm/golang-go/src/net/http/transfer.go:400 +0x6a
net/http.(*transferWriter).writeBody(0xc000097900, 0x142d940, 0xc0003b3700, 0x2, 0x2)
	/home/dougm/golang-go/src/net/http/transfer.go:364 +0x733
net/http.(*Request).write(0xc000346f00, 0x142d940, 0xc0003b3700, 0x0, 0xc0003faa50, 0x0, 0x0, 0x0)
	/home/dougm/golang-go/src/net/http/request.go:682 +0x6d3
net/http.(*persistConn).writeLoop(0xc00009fd40)
	/home/dougm/golang-go/src/net/http/transport.go:2207 +0x1c8
created by net/http.(*Transport).dialConn
	/home/dougm/golang-go/src/net/http/transport.go:1575 +0xb23

goroutine 1 [runnable]:
runtime.SetFinalizer(0xfdf020, 0xc0002618c0, 0x0, 0x0)
	/home/dougm/golang-go/src/runtime/mfinal.go:329 +0x89
os.(*file).close(0xc0002618c0, 0x14387f8, 0xc000247700)
	/home/dougm/golang-go/src/os/file_unix.go:252 +0x101
os.(*File).Close(0xc0002a4270, 0xc000247828, 0xc000247700)
	/home/dougm/golang-go/src/os/file_unix.go:233 +0x33
github.com/vmware/govmomi/govc/importx.(*TapeArchiveEntry).Close(0xc0003fa180, 0x0, 0x0)
	/home/vmware/src/github.com/vmware/govmomi/govc/importx/archive.go:92 +0x34
github.com/vmware/govmomi/govc/importx.(*ovfx).Upload(0xc00017f810, 0x1442b60, 0xc000038070, 0xc00029b680, 0xc00003c930, 0x2c, 0xc0003d99a0, 0x1e, 0x0, 0x0, ...)
	/home/vmware/src/github.com/vmware/govmomi/govc/importx/ovf.go:323 +0x375
github.com/vmware/govmomi/govc/importx.(*ovfx).Import(0xc00017f810, 0x127ab2b, 0x5, 0x0, 0x0, 0x0)
	/home/vmware/src/github.com/vmware/govmomi/govc/importx/ovf.go:297 +0x9e6
github.com/vmware/govmomi/govc/importx.(*ova).Import(...)
	/home/vmware/src/github.com/vmware/govmomi/govc/importx/ova.go:62
github.com/vmware/govmomi/govc/importx.(*ova).Run(0xc00000e160, 0x1442b60, 0xc000038048, 0xc000074840, 0x0, 0x0)
	/home/vmware/src/github.com/vmware/govmomi/govc/importx/ova.go:51 +0x10e
github.com/vmware/govmomi/govc/cli.Run(0xc0000300d0, 0x3, 0x3, 0xc00003e0b8)
	/home/vmware/src/github.com/vmware/govmomi/govc/cli/command.go:165 +0x56e
main.main()
	/home/vmware/src/github.com/vmware/govmomi/govc/main.go:97 +0x64

goroutine 6 [syscall, 2 minutes]:
os/signal.signal_recv(0xc000054787)
	/home/dougm/golang-go/src/runtime/sigqueue.go:147 +0x9c
os/signal.loop()
	/home/dougm/golang-go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
	/home/dougm/golang-go/src/os/signal/signal_unix.go:29 +0x41

goroutine 19 [IO wait]:
internal/poll.runtime_pollWait(0x7fd2b2351e68, 0x72, 0xffffffffffffffff)
	/home/dougm/golang-go/src/runtime/netpoll.go:184 +0x55
internal/poll.(*pollDesc).wait(0xc0002a0098, 0x72, 0x1c00, 0x1c98, 0xffffffffffffffff)
	/home/dougm/golang-go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
	/home/dougm/golang-go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc0002a0080, 0xc00031e000, 0x1c98, 0x1c98, 0x0, 0x0, 0x0)
	/home/dougm/golang-go/src/internal/poll/fd_unix.go:169 +0x1cf
net.(*netFD).Read(0xc0002a0080, 0xc00031e000, 0x1c98, 0x1c98, 0x203000, 0x0, 0x1c8b)
	/home/dougm/golang-go/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc00000e3b8, 0xc00031e000, 0x1c98, 0x1c98, 0x0, 0x0, 0x0)
	/home/dougm/golang-go/src/net/net.go:184 +0x68
crypto/tls.(*atLeastReader).Read(0xc0003372a0, 0xc00031e000, 0x1c98, 0x1c98, 0xc0000888c0, 0x41714e, 0xc0000888a0)
	/home/dougm/golang-go/src/crypto/tls/conn.go:780 +0x60
bytes.(*Buffer).ReadFrom(0xc00022f058, 0x142daa0, 0xc0003372a0, 0x409c45, 0x101a5c0, 0x11e9680)
	/home/dougm/golang-go/src/bytes/buffer.go:204 +0xb4
crypto/tls.(*Conn).readFromUntil(0xc00022ee00, 0x1432240, 0xc00000e3b8, 0x5, 0xc00000e3b8, 0x23)
	/home/dougm/golang-go/src/crypto/tls/conn.go:802 +0xec
crypto/tls.(*Conn).readRecordOrCCS(0xc00022ee00, 0x0, 0x0, 0x3)
	/home/dougm/golang-go/src/crypto/tls/conn.go:609 +0x124
crypto/tls.(*Conn).readRecord(...)
	/home/dougm/golang-go/src/crypto/tls/conn.go:577
crypto/tls.(*Conn).Read(0xc00022ee00, 0xc000312000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/home/dougm/golang-go/src/crypto/tls/conn.go:1255 +0x161
net/http.(*persistConn).Read(0xc000187b00, 0xc000312000, 0x1000, 0x1000, 0xc00003e660, 0xc000088c20, 0x404cf5)
	/home/dougm/golang-go/src/net/http/transport.go:1752 +0x75
bufio.(*Reader).fill(0xc0002e42a0)
	/home/dougm/golang-go/src/bufio/bufio.go:100 +0x103
bufio.(*Reader).Peek(0xc0002e42a0, 0x1, 0x0, 0x0, 0x1, 0xc00003e200, 0x0)
	/home/dougm/golang-go/src/bufio/bufio.go:138 +0x4f
net/http.(*persistConn).readLoop(0xc000187b00)
	/home/dougm/golang-go/src/net/http/transport.go:1905 +0x1d6
created by net/http.(*Transport).dialConn
	/home/dougm/golang-go/src/net/http/transport.go:1574 +0xafe

goroutine 20 [select]:
net/http.(*persistConn).writeLoop(0xc000187b00)
	/home/dougm/golang-go/src/net/http/transport.go:2204 +0x123
created by net/http.(*Transport).dialConn
	/home/dougm/golang-go/src/net/http/transport.go:1575 +0xb23

goroutine 22 [select, 2 minutes]:
github.com/vmware/govmomi/nfc.(*LeaseUpdater).waitForProgress(0xc0003fa150, 0xc00003c930, 0x2c, 0xc0003d99a0, 0x1e, 0x0, 0x0, 0x0, 0x158946800, 0x11, ...)
	/home/vmware/src/github.com/vmware/govmomi/nfc/lease_updater.go:95 +0x11f
created by github.com/vmware/govmomi/nfc.newLeaseUpdater
	/home/vmware/src/github.com/vmware/govmomi/nfc/lease_updater.go:79 +0x1a3

goroutine 23 [select]:
github.com/vmware/govmomi/nfc.(*LeaseUpdater).run(0xc0003fa150)
	/home/vmware/src/github.com/vmware/govmomi/nfc/lease_updater.go:125 +0x148
created by github.com/vmware/govmomi/nfc.newLeaseUpdater
	/home/vmware/src/github.com/vmware/govmomi/nfc/lease_updater.go:84 +0x1fc

@dougm
Copy link
Member

dougm commented May 8, 2020

What version of govc are you using? The "send on closed channel" is at vim25/progress/reader.go:116
But in master and the current release, the send is on line 117.
Which could mean you're missing a related fix 1620160

@jtarchie
Copy link
Author

jtarchie commented May 8, 2020

That latest stable release with official binaries -- v0.22.1. The v0.22.2 has been tagged on git, but no binaries have been released. We run this on platform that does not have the golang runtime available.

@jtarchie
Copy link
Author

jtarchie commented May 8, 2020

If you are looking for release management with go. Our team has been really happy with goreleaser. It provides a consistent way of building binaries and publishing them to Github, etc.

@dougm
Copy link
Member

dougm commented May 8, 2020

Right, v0.22.2 didn't impact govc, it was just a tag for govmomi on the release-0.22 branch.

Regarding goreleaser, @frapposelli added support for that, but we had to revert to bash + github-release at some point. He started to bring back goreleaser in #1375 , but that's on the back burner too. We'd love some help getting that done if your team is interested in helping!

@jtarchie
Copy link
Author

jtarchie commented May 8, 2020

The fix that you pointed out says it was only released in v0.22.2. They does affect govc then, right?

@dougm
Copy link
Member

dougm commented May 8, 2020

No, that fix is included since v0.17.1: v0.17.0...v0.17.1

I think what you're referring to is the GitHub UI displaying the latest tag that includes that commit. If you click on the ..., it'll show you all tags that includes the commit.

v0.22.2 only includes 1 commit since v0.22.1: v0.22.1...v0.22.2

@jtarchie
Copy link
Author

If the fix was released on v0.22.1, then we are experiencing something different.

I admit, that navigating the code is difficult, to "hunt down the bug". The CLI is request and response, via Soap albeit, but there doesn't appear to be streaming. The layer of goroutines and channels for a request and response makes it difficult to navigate.

We know that reader is closed, the initiate happens on a Done invocation, but an HTTP request (in a separate goroutine) is still reading from there.

The asynchronous coordination of the channels makes hard to know when Done was actually invoked. Can you explain the implementation, so maybe we debug more appropriate? Normally I'd stick this on a debugger, but since this happens 20% after a 5GB file has been uploaded on a slower internet connection, I'm not inclined to do that.

@jtarchie
Copy link
Author

jtarchie commented May 12, 2020

Also, I'm worried concerned about the usage of channels using the Report interface. For example, <- chan Report defines that an interface will be shared on the channel. The interface is a pointer to the underlying value. That means that though the pointer of the interface may be copied, the thing it points to is not copied.

I tried to illustrate that here.

package main

import (
	"fmt"
)

type Namer interface {
	Name() string
}

type A struct {
	name string
}

func (n A) Name() string {
	return n.name
}

func main() {
	a := A{"Mary"}
	w := make(chan Namer)

	go func() { w <- a }()

	fmt.Printf("a = %#v\n", a)

	b := <-w
	fmt.Printf("b = %#v\n", b)
}

Outputs that a and b are the same value:

a = main.A{name:"Mary"}
b = main.A{name:"Mary"}

I think this means <-chan Report does not protect against race conditions.
I'm not sure that this is a direct issue, but as I alluded to previously, the use of goroutines and channels in a request and response can introduce complexity.

@jtarchie
Copy link
Author

We are going to set GOMAXPROCS=1 and run our CI job again. I'm thinking that limiting the thread for context switching will eliminate the need of out-of-sync closing of channels.

@dougm
Copy link
Member

dougm commented May 12, 2020

I haven't looked at the progress code much myself, agree it is tough to follow. But it was written 6+ years ago and the only other issue in that time was #1057 (which someone else fixed). I still think regardless of this issue, we should have an option to disable the progress reporting.
I'd also rather find a console progress package to replace of govc's, than get into any major refactoring of vim25/progress.

@jtarchie
Copy link
Author

This may not be the only other issue, this maybe the only other reported issue.
Our bug in particular is fine if you rerun commands, etc. Most people are probably ok just doing a retry.

So far, setting GOMAXPROCS=1 seems to have helped our error.
We can do this our CI environment, but we are trying to figure out how to make downstream users aware of this.
The go runtime can set this value explicitly in the compiled binary.

@dougm
Copy link
Member

dougm commented May 13, 2020

I'm not doubting there's bug(s) in the progress code. My point was just that I don't know the progress code myself as there hasn't been a need. Glad you found a workaround. You mentioned this only happens in CI and you weren't able to reproduce running it manually..
When you tried manually, was that on the same system that CI runs or on your desktop system?
How many CPU cores does the CI system have where govc import runs?
Is the CI system running other jobs in parallel during govc import? Just wondering if the CPU core are busy doing other work when govc import works and idle when it fails.

@jtarchie
Copy link
Author

When you tried manually, was that on the same system that CI runs or on your desktop system?

It was not on the same system. The reason being the network connection from workstation is slow to upload 5GB to the DC. The CI system has a direct connection. For human reasons, I didn't want to wait.

How many CPU cores does the CI system have where govc import runs?

16, a large worker

Is the CI system running other jobs in parallel during govc import? Just wondering if the CPU core are busy doing other work when govc import works and idle when it fails.

It is running many other jobs in parallel. Nothing CPU intensive, mainly network I/O.

@jtarchie
Copy link
Author

I'm closing this as we were able to work around the problem by setting GOMAXPROCS=1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants