Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sign transaction error: session not found #247

Open
Zimbakin opened this issue Jan 6, 2021 · 24 comments
Open

Sign transaction error: session not found #247

Zimbakin opened this issue Jan 6, 2021 · 24 comments
Labels

Comments

@Zimbakin
Copy link

Zimbakin commented Jan 6, 2021

  • Trezor Model T
  • firmware 2.3.4
  • Trezor suite 20.12.1
  • Mac OS

Sending from Account 1 P2SH address to newly created Account 2 with new P2SH address returns error:
Sign transaction error: session not found
Trezor model T freezes.

@Zimbakin Zimbakin added the bug label Jan 6, 2021
@prusnak
Copy link
Member

prusnak commented Jan 6, 2021

Try different USB cable, different port, etc.

This is not a bug in Suite, but in communication with the device.

@prusnak prusnak closed this as completed Jan 6, 2021
@matejcik
Copy link

could also be a Bridge problem.
this forum user reports same problem: https://forum.trezor.io/t/sign-transaction-error-session-not-found-aug-29-2021-12-28/2463

in that case, using web suite + webusb resolved the problem

@petrkr
Copy link

petrkr commented Dec 29, 2021

Hello,
recently I get similar problem with Trezor One on huge transactions which took long to sign.

Suite: 21.12.2
Bridge: 31 (same issue on 29)
FW One: 1.10.4 (same issue on 1.10.3)

EDIT: I can confirm WebUSB (thrught web based suite) works fine. So it's just in bridge

most recent interesing log from verbose bridge calls during signature:

[154.545347 : 12:00:45] POST /call/4
[154.545398 : 12:00:45] [server/api/api.go 196 server/api.(*api).call] start
[154.545425 : 12:00:45] [core/core.go 568 core.(*Core).Call] checking other call on same session
[154.545438 : 12:00:45] [core/core.go 574 core.(*Core).Call] checking other call on same session done
[154.545451 : 12:00:45] [core/core.go 599 core.(*Core).Call] before actual logic
[154.545462 : 12:00:45] [core/core.go 607 core.(*Core).writeDev] decodeRaw
[154.545474 : 12:00:45] [core/core.go 659 core.(*Core).decodeRaw] readAll
[154.545483 : 12:00:45] [core/core.go 661 core.(*Core).decodeRaw] decodeString
[154.545494 : 12:00:45] [core/core.go 681 core.(*Core).decodeRaw] returning
[154.545504 : 12:00:45] [core/core.go 613 core.(*Core).writeDev] writeTo
[154.545512 : 12:00:45] [wire/v1.go 25 wire.(*Message).WriteTo] start
[154.545521 : 12:00:45] [wire/v1.go 39 wire.(*Message).WriteTo] actually writing
[154.545532 : 12:00:45] [usb/libusb.go 555 usb.(*LibUSBDevice).Write] write start
[154.545542 : 12:00:45] [usb/libusb.go 505 usb.(*LibUSBDevice).readWrite] start
[154.545551 : 12:00:45] [usb/libusb.go 507 usb.(*LibUSBDevice).readWrite] checking closed
[154.545564 : 12:00:45] [usb/libusb.go 515 usb.(*LibUSBDevice).readWrite] actual interrupt transport
[154.546128 : 12:00:45] [usb/libusb.go 519 usb.(*LibUSBDevice).readWrite] single transfer done
[154.546144 : 12:00:45] [usb/libusb.go 535 usb.(*LibUSBDevice).readWrite] single transfer successful
[154.546158 : 12:00:45] [core/core.go 619 core.(*Core).readDev] readFrom
[154.546171 : 12:00:45] [wire/v1.go 76 wire.ReadFrom] start
[154.546183 : 12:00:45] [usb/libusb.go 569 usb.(*LibUSBDevice).Read] read start
[154.546195 : 12:00:45] [usb/libusb.go 505 usb.(*LibUSBDevice).readWrite] start
[154.546208 : 12:00:45] [usb/libusb.go 507 usb.(*LibUSBDevice).readWrite] checking closed
[154.546221 : 12:00:45] [usb/libusb.go 515 usb.(*LibUSBDevice).readWrite] actual interrupt transport
[154.547112 : 12:00:45] [usb/libusb.go 519 usb.(*LibUSBDevice).readWrite] single transfer done
[154.547125 : 12:00:45] [usb/libusb.go 535 usb.(*LibUSBDevice).readWrite] single transfer successful
[154.547135 : 12:00:45] [wire/v1.go 96 wire.ReadFrom] actual reading started
[154.547144 : 12:00:45] [wire/v1.go 119 wire.ReadFrom] actual reading finished
[154.547154 : 12:00:45] [core/core.go 625 core.(*Core).readDev] encoding back
[154.547163 : 12:00:45] [core/core.go 691 core.(*Core).encodeRaw] start
[154.547172 : 12:00:45] [core/core.go 601 core.(*Core).Call] after actual logic
127.0.0.1 - - [29/Dec/2021:12:00:45 +0100] "POST /call/4 HTTP/1.1" 200 92
[154.547189 : 12:00:45] 127.0.0.1 - - [29/Dec/2021:12:00:45 +0100] "POST /call/4 HTTP/1.1" 200 92

Here maybe ?

[154.547251 : 12:00:45] [core/core.go 590 core.(*Core).Call.func4] detected request close context canceled, auto-release


[154.547266 : 12:00:45] [core/core.go 359 core.(*Core).release] session 4
[154.547275 : 12:00:45] [core/core.go 368 core.(*Core).release] bus close
[154.547281 : 12:00:45] [usb/libusb.go 436 usb.(*LibUSBDevice).Close] storing d.closed
[154.547288 : 12:00:45] [usb/libusb.go 444 usb.(*LibUSBDevice).Close] canceling previous transfers
[154.547295 : 12:00:45] [usb/libusb.go 453 usb.(*LibUSBDevice).Close] finishing read queue
[154.547302 : 12:00:45] [usb/libusb.go 498 usb.(*LibUSBDevice).finishReadQueue] transfer
POST /call/4
[154.551047 : 12:00:45] POST /call/4
[154.551112 : 12:00:45] [server/api/api.go 196 server/api.(*api).call] start
[154.551148 : 12:00:45] [server/api/api.go 277 server/api.(*api).respondError] Returning error: session not found
127.0.0.1 - - [29/Dec/2021:12:00:45 +0100] "POST /call/4 HTTP/1.1" 400 30
[154.551227 : 12:00:45] 127.0.0.1 - - [29/Dec/2021:12:00:45 +0100] "POST /call/4 HTTP/1.1" 400 30
POST /release/4
[154.563856 : 12:00:45] POST /release/4
[154.563904 : 12:00:45] [server/api/api.go 154 server/api.(*api).release] start
[154.563921 : 12:00:45] [core/core.go 359 core.(*Core).release] session 4
[154.563934 : 12:00:45] [core/core.go 363 core.(*Core).release] session not found
[154.563948 : 12:00:45] [server/api/api.go 277 server/api.(*api).respondError] Returning error: session not found
127.0.0.1 - - [29/Dec/2021:12:00:45 +0100] "POST /release/4 HTTP/1.1" 400 30
[154.563976 : 12:00:45] 127.0.0.1 - - [29/Dec/2021:12:00:45 +0100] "POST /release/4 HTTP/1.1" 400 30
[154.597499 : 12:00:45] [usb/libusb.go 458 usb.(*LibUSBDevice).Close] releasing interface
[154.597547 : 12:00:45] [usb/libusb.go 477 usb.(*LibUSBDevice).Close] low level close
[154.597573 : 12:00:45] [usb/libusb.go 479 usb.(*LibUSBDevice).Close] done

@prusnak prusnak reopened this Dec 29, 2021
@prusnak
Copy link
Member

prusnak commented Dec 29, 2021

Thanks, reopening

@mroz22
Copy link

mroz22 commented Dec 31, 2021

I have also found a similar, maybe related, case trezor/connect#948

@STew790
Copy link

STew790 commented Feb 7, 2022

I tried to reproduce this issue in the recent version of Suite (22.1.1 e7262e5) but I wasn't successful yet. Nonetheless, I will be monitoring this issue and as soon as I find something, I will add it here.

@STew790
Copy link

STew790 commented Feb 23, 2022

I tried to reproduce this issue throughout the past month on various versions of Suite and I wasn't successful.
Let us know if you get this issue again, please.
Closing (could not reproduce).

@STew790 STew790 closed this as completed Feb 23, 2022
@petrkr
Copy link

petrkr commented May 24, 2022

Problem still persist on latest suite, trezord and Trezor One firmware

Trezor suite: 22.5.2
Trezord (bridge) used from suite bundle: v2.0.31 (rev 5f53132)
Trezor one FW: 1.11.1

When trying to sign huge UTXO input TX, trezor session not found occurres.

I can provide some debugging cooperation. I do not know why it is this closed if is NOT fixed yet... It is really problem if you can not send transaction and you have to send by smaller parts which you have to again merge (so you are able to send higher amount).

More testing:
Web suite (https://suite.trezor.io/web) on WebUSB (trezord is shutdown) - Works
Web suite with trezord - causes session not found, but not so often and only on really huge inputs
Suite AppImage with trezord (regardless if bundle version or own systemd version) - causes session not found almost on every bigger transaction

I logged communication between trezord by wireshark, but nothing much I can see there.. Only POST 200, and then two POST will return "400 Bad request" with payload "Session not found".

So probably better logging of trezord is needed to investigate this.

@petrkr
Copy link

petrkr commented May 24, 2022

Well I probably found something what can be connected with it...

There is some long request to bridge (/listen) which hangs response for like 30 secods (or maybe until some /calls or /release are called? Did not checked code well yet.

image

In closer filter you can see this response came after 30 sec to the client...
image

Maybe those asynchronous calls can be reason? I will check something like parallel requests from electron. And something like re-use connections (now I can see it reopen new TCP connection for every call)

@prusnak prusnak reopened this May 24, 2022
@petrkr
Copy link

petrkr commented May 24, 2022

Problem is, I am not Golang programmer, so I do not understand well to those async things.. Also when is ctx.Done() True? From documentation of http.Request.context() I am not too much wise...

But also I read documentation of "select" which waits until atleast some job is done (true), in case of method "Call" it is eighter finished (probably call?) or that context.done() (which will close USB session and causes this issue)...

from documentation if both are true, Golang will use "random" to select one of it.

That means, if context is done and call already finished too, then in random cases ctx.done() could be called prior to finished (return) one..

But I tested it on https://go.dev/tour/concurrency/5 case where I made two "true" and this random was like 50:50

@Zimbakin
Copy link
Author

I solved this problem over a year ago by switching to another brand of hardware wallet. Let me just add, you probably wont have this problem unless you have some big transactions. Either way Trezor couldn’t handle it on both web and suite. It now resides as a hood ornament because no one could solve it.

@petrkr
Copy link

petrkr commented May 24, 2022

I solved this problem over a year ago by switching to another brand of hardware wallet. Let me just add, you probably wont have this problem unless you have some big transactions. Either way Trezor couldn’t handle it on both web and suite. It now resides as a hood ornament because no one could solve it.

actually web with webusb works, because problem seems to be somewhere in bridge and it's asynchronous calls... As programmer I know async things are hell to debug... Especially if you do not have enough testing data and it happens only on some environments...

That is why I trying to help somehow instead of just say "you are bad" and buy other wallet, because I think nothing can compare with trezor in other cases...

@Zimbakin
Copy link
Author

I solved this problem over a year ago by switching to another brand of hardware wallet. Let me just add, you probably wont have this problem unless you have some big transactions. Either way Trezor couldn’t handle it on both web and suite. It now resides as a hood ornament because no one could solve it.

actually web with webusb works, because problem seems to be somewhere in bridge and it's asynchronous calls... As programmer I know async things are hell to debug... Especially if you do not have enough testing data and it happens only on some environments...

That is why I trying to help somehow instead of just say "you are bad" and buy other wallet, because I think nothing can compare with trezor in other cases...

No, as stated the problem persisted on both web and suite, which locked me out for an extended period and became quite costly.

I didn’t say “you are bad”, I stated Trezor couldn’t handle it which I find unacceptable. If you pay good money for a device, it’s fair to expect that it works. Good luck and I hope you solve it.

@petrkr
Copy link

petrkr commented May 24, 2022

so one more time,, on web based suite with WebUSB (chromium/chrome) without trezord (bridge) it works fine... Also it would work with electrum.

So there are atleast two options how to proceed those payments.

I agree it's pity for newbie who buy boxed hardware and use official app (trezor suite) which have such problem. But still it's opensource, so anyone can help to diagnostic.

You choose way to buy other device, that is also solution for some peoples. For me it is not solution and also use electrum or webusb will cause I will lost only one device, where this issue occurs and I will close way how to debug/help to solve it.

@sime
Copy link
Member

sime commented May 25, 2022

I'm not Go developer either, perhaps a faux trezord load test can be devised to narrow down how big is too big and thus include it testing strategy.

@bosomt
Copy link
Contributor

bosomt commented May 25, 2022

i just tried to send two big transactions "size": 14882, and "size": 7535, and resend/resign it via bump fee but it never failed in app with embedded bridge.

T1 including white-noise background - https://tbtc2.trezor.io/tx/924195e88846242a5be5bffe587ae830d64936ac421da234f7df2d202cdf97ec
TT https://tbtc1.trezor.io/tx/649c8c4968c473f4a556e627defd41314d347b64b197ec5cd24ad4219fd6d486
TT https://tbtc1.trezor.io/tx/de3f7413ed8131bfe41259927f749fca8904af6b805025c5e560a49dffb02a35

Info:

  • Suite version: desktop 22.6.0 (d72ff4c5b9f3cbe2ccb4be47170f88c08e9271d1)
  • Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) TrezorSuite/22.6.0 Chrome/94.0.4606.81 Electron/15.3.0 Safari/537.36
  • OS: MacIntel
  • Screen: 1680x1050
  • Device: model T 2.5.1 regular (revision a1d22890ee6a9ffec4edfc6e1b5b6c74d23855a3)
  • Transport: bridge 2.0.31

@sime
Copy link
Member

sime commented May 25, 2022

@petrkr Perhaps you could try re-creating your BTC transaction with test coins. Then we can better analyse the problem transaction.

Alternatively try and provide us some numbers, e.g. number of inputs, UTXOs, outputs, segwit ? etc.

@petrkr
Copy link

petrkr commented May 25, 2022

@petrkr Perhaps you could try re-creating your BTC transaction with test coins. Then we can better analyse the problem transaction.

Alternatively try and provide us some numbers, e.g. number of inputs, UTXOs, outputs, segwit ? etc.

Detail transaction in suite says:
Size 1680 bytes

there are 18 inputs as script_type": "SPENDP2SHWITNESS"

I had same problem on more currencies (LTC, DOGE...) So that would not be problem of coin but somewhere in communication....

As I mention for example that async behavior of "select" and "random" selection of multiple true values.

I can try access to some tBTC

@matejcik
Copy link

@petrkr the /listen call might be a red herring -- the purpose of the call is to wait for connection events from newly plugged in Trezors. Bridge is supposed to return from the call either when you plug in a Trezor, or after 30s of that not happening -- in which case Suite will just call /listen again.
So whatever else is actually going on in the background, /listen is very likely to always be the last thing you see in the log.

@petrkr
Copy link

petrkr commented May 25, 2022

@petrkr the /listen call might be a red herring -- the purpose of the call is to wait for connection events from newly plugged in Trezors. Bridge is supposed to return from the call either when you plug in a Trezor, or after 30s of that not happening -- in which case Suite will just call /listen again. So whatever else is actually going on in the background, /listen is very likely to always be the last thing you see in the log.

I see.

Then I will really go for that select's random behavior where is ctx.Done() and finished.

Seems at windows it does not occurs (bridge 27), Mac i do not have.

When ctx is done? When http request is finished? It could be some localhost IP stack behaviorc so ctx.done is earlier than func finish.. or actually because it is 'go' rutine, it is like background task and ctx.done can be invoked earlier than defet (or how was that modifier) about finish true. Because IP stack is faster and it's multiprocesoring etc.

I would like to add more debug logs in this code part to investigate,, but unfortunately I do not have too much time to help with it (even i actually wanted, because I like to do low-level things).

For now I can only help by some evenings investigations, because I have environment where I can reproduce it.

@matejcik
Copy link

matejcik commented May 25, 2022

per the http.request docs:

For incoming server requests, the context is canceled when the client's connection closes, the request is canceled (with HTTP/2), or when the ServeHTTP method returns.

your tcp dump seems to indicate that the client didn't close anything

from my reading of the code and docs, it seems that your guess is correct: the deferred finished function happens to run later than ctx.Done() -- which executes when ServeHTTP returns, i.e., when the request is done processing. (we see in logs that the preceding request is successful (code 200))
if the http layer is fast enough to complete before thie goroutine has the chance to return, this is what we'd see.

try placing this in the ctx.done branch:

			select {
			case <-finished:
				return
			default:
			}

(iow, if you receive a Done, non-blocking-check if you don't also have a finished waiting)

the question is of course why in the world you can reproduce this so reliably while we're hammering the bridge hard as we can without results 🤷‍♀️

@matejcik
Copy link

this is clearly a bridge issue at this point, @prusnak can you please move to the bridge repo?

@prusnak prusnak transferred this issue from trezor/trezor-suite May 25, 2022
@prusnak
Copy link
Member

prusnak commented May 25, 2022

this is clearly a bridge issue at this point, @prusnak can you please move to the bridge repo?

Done

@prusnak
Copy link
Member

prusnak commented May 25, 2022

Issue #197 (closed as duplicate of this issue) describes the same error but with Electrum

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

8 participants