Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[macOS] SpineInterface can't access the shared memory #375

Closed
ubgk opened this issue Apr 29, 2024 · 6 comments · Fixed by stephane-caron/vulp#94
Closed

[macOS] SpineInterface can't access the shared memory #375

ubgk opened this issue Apr 29, 2024 · 6 comments · Fixed by stephane-caron/vulp#94
Labels
bug Something isn't working

Comments

@ubgk
Copy link
Contributor

ubgk commented Apr 29, 2024

Bug description

SpineInterface cannot access the shared memory file, and my agent fails:

INFO: Analyzed target //agents/pid_balancer:pid_balancer (4 packages loaded, 13 targets configured).
INFO: Found 1 target...
Target //agents/pid_balancer:pid_balancer up-to-date:
  bazel-bin/agents/pid_balancer/pid_balancer
INFO: Elapsed time: 7.828s, Critical Path: 0.05s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/agents/pid_balancer/pid_balancer -c bullet
[2024-04-30 00:05:39,678] [info] Loading configuration 'config/common.gin' (main.py:78)
[2024-04-30 00:05:39,681] [info] Loading configuration 'config/bullet.gin' (main.py:78)
Waiting for spine /vulp to start...
FileNotFoundError: /vulp
Traceback (most recent call last):
  File "/private/var/tmp/_bazel_bora/ee02e7f8ec3e022d620c305986e67335/execroot/upkie/bazel-out/darwin-opt/bin/agents/pid_balancer/pid_balancer.runfiles/upkie/agents/pid_balancer/main.py", line 105, in <module>
    spine = SpineInterface()
            ^^^^^^^^^^^^^^^^
  File "/private/var/tmp/_bazel_bora/ee02e7f8ec3e022d620c305986e67335/execroot/upkie/bazel-out/darwin-opt/bin/agents/pid_balancer/pid_balancer.runfiles/vulp/vulp/spine/spine_interface.py", line 79, in __init__
    shared_memory = wait_for_shared_memory(shm_name, retries)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/var/tmp/_bazel_bora/ee02e7f8ec3e022d620c305986e67335/execroot/upkie/bazel-out/darwin-opt/bin/agents/pid_balancer/pid_balancer.runfiles/vulp/vulp/spine/spine_interface.py", line 53, in wait_for_shared_memory
    raise SpineError(
vulp.spine.exceptions.SpineError: spine /vulp did not respond after 1 attempts

Expected behavior

When launching try_pid_balancer.sh, the agent should be able to access the shared-memory file and run normally.

Reproduction steps

Steps to reproduce the behavior:

  1. Run try_pid_balancer.sh on macOS (It works on Linux 20.04).

Additional context

The problem disappears once I checkout an earlier version of the spine_interface.py file, all other things remaining the same (e.g. git checkout b179d224da0468a93b6a9a051f6137b70f86498c -- vulp/spine/spine_interface.py ).

I am creating this issue here as it seems to be caused by vulp.

I suspect the issue is related to the standard library (or the way we use it), as posix_ipc works just fine.

System

  • OS: macOS 13.3
  • Vulp version: 2.2.1
@stephane-caron stephane-caron added the bug Something isn't working label Apr 30, 2024
@stephane-caron
Copy link
Member

It likely follows from https://github.com/upkie/vulp/releases/tag/v2.2.0 where we switched from posix_ipc to the Python standard library.

I'm surprised your log only attempts once to wait for the spine: in the current PID balancer, it should try 10 times. Maybe try increasing to retries=10?

@ubgk
Copy link
Contributor Author

ubgk commented May 2, 2024

I think the problem is inconsistent handling of the POSIX naming standard on different platforms (and the sloppy handling of it by Python).

On Linux, shm_open('foo', ...), shm_open('\foo', ...) and shm_open('\\foo', ...) will all be resolved to the same object. (Is it because /dev/shm/foo == /dev/shm//foo ?)

It is, however, not the case on macOS, the object names should be an exact match. (Similarly, there is no /dev/shm/foo on macOS).

It seems that the standard SharedMemory module prepends the name with a slash: link. So when SpineInterface tries to open /vulp, it is in fact trying to open //vulp (which is OK on Linux, but not on macOS).

The simplest solution I can think of is to get rid of all leading slashes in our Python files. What do you think?

@ubgk
Copy link
Contributor Author

ubgk commented May 2, 2024

I think the case makes itself for adding integration tests. 😅

Should I add try_pid_balancer.sh to the CI of upkie?

@stephane-caron
Copy link
Member

Also, unit tests 😉 Would it be hard to make one for this particular case?

An integration test sounds good too 👍 We could add an option to try_pid_balancer.sh (forwarded to the Bullet spine) to stop after a fixed number of iterations, so that the test can run in an asynchronous and deterministic way. Can you propose a PR for this?

@ubgk
Copy link
Contributor Author

ubgk commented May 2, 2024

Also, unit tests 😉 Would it be hard to make one for this particular case?

I cannot think of a very good case right off the bat. Maybe a Python test which creates a mock spine and tries to attach to it through SpineInterface could work, what do you think?

An integration test sounds good too 👍 We could add an option to try_pid_balancer.sh (forwarded to the Bullet spine) to stop after a fixed number of iterations, so that the test can run in an asynchronous and deterministic way. Can you propose a PR for this?

What you propose is an option and it should be easy to implement. I think a KISS alternative would be to just add a step like timeout N ./try_pid_balancer.sh (with a build step before, obviously). If the process quits for anything other than a timeout, we'll know that way. Does that sound good?

@stephane-caron
Copy link
Member

stephane-caron commented May 2, 2024

Hmm, I would be wary of using timeout in a CI workflow. Things timing-related tend to break in CI runners, where time can be quite elastic. For instance here, the script may time out out before doing anything useful, with the test passing as a false positive.

Maybe a Python test which creates a mock spine and tries to attach to it through SpineInterface could work, what do you think?

That's what the test in spine_interface_test.py does 🙈 The reason it passed is that it already had the correct syntax for macOS:

https://github.com/upkie/vulp/blob/870bb20cd8fecc3a890968c54f6159eedc5c9f8c/vulp/spine/tests/spine_interface_test.py#L30

@stephane-caron stephane-caron transferred this issue from stephane-caron/vulp Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging a pull request may close this issue.

2 participants