Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(test): tor transport test fails with SIGSEGV on macOS M1 #1105

Closed
wants to merge 1 commit into from

Conversation

diegomrsantos
Copy link
Collaborator

@diegomrsantos diegomrsantos commented May 27, 2024

When running the testtortransport locally on macOS, it fails with the following error:

Tor transport WRN 2024-05-27 20:05:11.446+02:00 TCP transport already stopped              topics="libp2p tcptransport" tid=117174872
.WRN 2024-05-27 20:05:11.448+02:00 TCP transport already stopped              topics="libp2p tcptransport" tid=117174872
.WRN 2024-05-27 20:05:11.450+02:00 TCP transport already stopped              topics="libp2p tcptransport" tid=117174872
.WRN 2024-05-27 20:05:11.467+02:00 TCP transport already stopped              topics="libp2p tcptransport" tid=117174872
.WRN 2024-05-27 20:05:11.469+02:00 TCP transport already stopped              topics="libp2p tcptransport" tid=117174872
..WRN 2024-05-27 20:05:11.471+02:00 TCP transport already stopped              topics="libp2p tcptransport" tid=117174872
Traceback (most recent call last)
/nim-libp2p/nimbledeps/pkgs/unittest2-#2300fa9924a76e6c96bc4ea79d043e3a0f27120c/unittest2.nim(1151) testtortransport
/nim-libp2p/nimbledeps/pkgs/unittest2-#2300fa9924a76e6c96bc4ea79d043e3a0f27120c/unittest2.nim(1074) runDirect
/workspace/nim-libp2p/tests/helpers.nim(55) runTest`gensym308
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Error: execution of an external program failed: '/nim-libp2p/tests/testnative '

@@ -45,13 +45,12 @@ template commonTransportTest*(prov: TransportProvider, ma1: string, ma2: string

await conn.close() #for some protocols, closing requires actively reading, so we must close here

await handlerWait.wait(1.seconds) # when no issues will not wait that long!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when no issues will not wait that long

Is that still true when doing this before awaiting allFuturesThrowing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's not necessary anymore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But not sure how this is related to the allFuturesThrowing. I think it is just waiting for the acceptHandler to finish and the conn to be closed.

@@ -72,13 +71,14 @@ template commonTransportTest*(prov: TransportProvider, ma1: string, ma2: string

await conn.close() #for some protocols, closing requires actively reading, so we must close here

check string.fromBytes(msg) == "Hello!"
await handlerWait.wait(1.seconds) # when no issues will not wait that long!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.
Why does this fix the issue on MacOS M1 specifically?

startFut = stub.start(torServer)
asyncTeardown:
await startFut.cancelAndWait()
await stub.stop()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the changes from line 28-40 effect the described behaviour or is this general refactoring?

@diegomrsantos
Copy link
Collaborator Author

diegomrsantos commented May 28, 2024

Adding more context, this error happened randomly when running the test alone and all the time when running nimble test. It's unclear what the problem is and why the suggested changes fix it.

My laptop was generally slow, and after investigating more, I found that qemu was using 14GB of memory. After killing the process, it wasn't possible to reproduce the problem anymore.

Copy link
Collaborator

@kaiserd kaiserd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you.

more context: this is necessary for #1099 (see description)

Let's open an issue regarding investigating this error further.

@diegomrsantos
Copy link
Collaborator Author

LGTM. Thank you.

more context: this is necessary for #1099 (see description)

Let's open an issue regarding investigating this error further.

Now that I know the problem was due to some unusually big memory consumption on my laptop, I'm not sure it's worth it to merge this hack. I did it cause I didn't know any better, but closing qemu and restarting the laptop fixed the problem.

@kaiserd
Copy link
Collaborator

kaiserd commented May 29, 2024

Thanks for the investigation. Closing this since the main problem this PR addressed is not occurring anymore.
We still should investigate the root cause in the future (adding to ice-box).

@kaiserd kaiserd closed this May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: icebox
Development

Successfully merging this pull request may close these issues.

None yet

2 participants