Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reliable way to detect if emulator is up #1668

Open
matejcik opened this issue Jun 11, 2021 · 1 comment
Open

Reliable way to detect if emulator is up #1668

matejcik opened this issue Jun 11, 2021 · 1 comment
Labels
code Code improvements core Trezor Core firmware. Runs on Trezor Model T and T2B1. tests Automated integration tests

Comments

@matejcik
Copy link
Contributor

Currently the library repeatedly sends UDP message PINGPING, until the emulator responds with PONGPONG. If that does not happen within 1 second, emulator is considered dead.

This opens some interesting race conditions. In particular, what occasionally seems to happen in test runs:

  • host sends PINGPING and does not get a reply within the short waiting interval
  • host sends PINGPING again
  • emulator responds PONGPONG to first PINGPING
  • host sends protobuf message and waits for reply
  • emulator responds PONGPONG to second PINGPING
  • host receives PONGPONG instead of expected protobuf

The reason host hammers emulator with PINPINGs is that as of workflow-restarts, there's no guarantee that a running emulator will respond to a PONGPONG in a timely fashion.

There are various possible solutions to the issue. We need to figure out one that's actually reliable and not affected by whether, e.g., the emulator is currently running an event loop.

@matejcik matejcik added core Trezor Core firmware. Runs on Trezor Model T and T2B1. feature tests Automated integration tests code Code improvements labels Jun 11, 2021
@matejcik
Copy link
Contributor Author

FWIW: something wrong is going on with socket.settimeout in udp.py. When I'm very insistent in code that a timeout of 10 seconds should be used when waiting for the PONGPONG response, it works nicely without hammering. So that might be a path towards a quick fix.

More generally, however, it's still not doing it reliably.

@matejcik matejcik added this to the freezer milestone Jul 30, 2021
matejcik added a commit that referenced this issue Jul 30, 2021
There's two udp calls in `UdpTransport._ping()`:
- socket.sendall(b"PINGPING") -> this will be instanteous, AND it will
raise if the receiving side is not listening.
- socket.recv() -> this will wait for SOCKET_TIMEOUT seconds, but only
in case the sendall() succeeded. This means that receiving side exists
and we are now waiting until it's awake enough to respond.

In conclusion, we avoid hammering emulator with PINGPINGs with a timeout
so short we don't see an answer. This should avoid the problem
occasionally seen in CI and described in #1668
matejcik added a commit that referenced this issue Jul 30, 2021
There's two udp calls in `UdpTransport._ping()`:
- socket.sendall(b"PINGPING") -> this will be instanteous, AND it will
raise if the receiving side is not listening.
- socket.recv() -> this will wait for SOCKET_TIMEOUT seconds, but only
in case the sendall() succeeded. This means that receiving side exists
and we are now waiting until it's awake enough to respond.

In conclusion, we avoid hammering emulator with PINGPINGs with a timeout
so short we don't see an answer. This should avoid the problem
occasionally seen in CI and described in #1668
matejcik added a commit that referenced this issue Jul 30, 2021
There's two udp calls in `UdpTransport._ping()`:
- socket.sendall(b"PINGPING") -> this will be instanteous, AND it will
raise if the receiving side is not listening.
- socket.recv() -> this will wait for SOCKET_TIMEOUT seconds, but only
in case the sendall() succeeded. This means that receiving side exists
and we are now waiting until it's awake enough to respond.

In conclusion, we avoid hammering emulator with PINGPINGs with a timeout
so short we don't see an answer. This should avoid the problem
occasionally seen in CI and described in #1668
matejcik added a commit that referenced this issue Aug 4, 2021
There's two udp calls in `UdpTransport._ping()`:
- socket.sendall(b"PINGPING") -> this will be instanteous, AND it will
raise if the receiving side is not listening.
- socket.recv() -> this will wait for SOCKET_TIMEOUT seconds, but only
in case the sendall() succeeded. This means that receiving side exists
and we are now waiting until it's awake enough to respond.

In conclusion, we avoid hammering emulator with PINGPINGs with a timeout
so short we don't see an answer. This should avoid the problem
occasionally seen in CI and described in #1668
@tsusanka tsusanka moved this from 📥 Inbox to 👨‍💻 Code in Firmware · Backlog 🗂 Oct 5, 2021
@tsusanka tsusanka removed this from the freezer milestone Oct 6, 2021
@matejcik matejcik added the LOW label Oct 7, 2021
@alex-jerechinsky alex-jerechinsky added this to 💻 Code in Backlog 🗂 Oct 22, 2021
@alex-jerechinsky alex-jerechinsky removed this from 💻 Code in Firmware · Backlog 🗂 Oct 22, 2021
@hynek-jina hynek-jina removed the LOW label May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code Code improvements core Trezor Core firmware. Runs on Trezor Model T and T2B1. tests Automated integration tests
Projects
Status: No status
Backlog 🗂
💻 Code
Development

No branches or pull requests

3 participants