Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running LocalCluster.start/0 #7

Closed
jeroenbourgois opened this issue Apr 2, 2019 · 8 comments
Closed

Error when running LocalCluster.start/0 #7

jeroenbourgois opened this issue Apr 2, 2019 · 8 comments

Comments

@jeroenbourgois
Copy link

When running LocalCluster.start/0 I get the following error:

{:error, 
  {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}},  
   {:child, :undefined, :net_sup_dynamic,   
    {:erl_distribution, :start_link, [[:"manager@127.0.0.1"], false]},   
    :permanent, 1000, :supervisor, [:erl_distribution]}}}

I followed the getting started guide, other then that I have a pretty simple phoenix app with some other deps.

This is my Elixir and Erlang/OTP version:

Erlang/OTP 21 [erts-10.3] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe] [dtrace]
Elixir 1.8.1 (compiled with Erlang/OTP 21)

Any clue?

@keathley
Copy link

keathley commented Apr 2, 2019

Are you seeing this when running tests? I've seen these kinds of issues before if you have firewall issues that are stopping epmd from opening the ports it needs. You can try running iex --sname gold to ensure that you can start nodes with distribution.

@jeroenbourgois
Copy link
Author

@keathley yes, after running mix test. Running the iex command works, and apparently after doing that the error message has gone away!

Having some other issues now, with every :ets_lookup going wrong, for example the ones in Phoenix itself:

 :ets.lookup(MyApp.Endpoint, :secret_key_base)

But then this is probably not that crazy. I have zero experience with multiple nodes, we are using Cachex as a cache layer in our project, but we are planning a massive scale. For that purpose we wanted to do a test with several nodes running the app, to anticipate having to scale like that.

@jeroenbourgois
Copy link
Author

@keathley nevermind, I got around it, I don't need any Phoenix related tests for this, so I just start the LocalCluster from my test file. Then after that I can just pass the nodes to Cachex and it seems to work!

@keathley
Copy link

keathley commented Apr 3, 2019

Glad you got it working. Not sure what operating system you're using but on macos you typically need to run iex with distribution and click a prompt to allow epmd to open port connections. After that things work. If you only run tests then it doesn't present the prompt (or presents it so quickly that you don't notice it). In any case, glad it's working.

@toranb
Copy link

toranb commented Apr 19, 2019

@keathley I see this same error from a cold machine boot and I'm looking for some suggestions about what I could do to avoid running iex 1x to ensure mix test will work without failure. Your comment above mentions clicking a prompt -could this be done programmatically w/ some api as part of my build script for example?

... on macos you typically need to run iex with distribution and click a prompt to allow epmd to open port connections

Note: this isn't life ending and my workaround is mostly fine. I'm just curious to learn about alternatives :)

Full working example you need to see the exact failure (or if anyone who follows is interested)

toranb/elixir-budget@78c72db

@keathley
Copy link

keathley commented Apr 19, 2019

Generally when you see this error its because epmd isn't starting or hasn't started. I mostly see this in CI or other linux envs. My solution is to explicitly start epmd -daemon prior to running tests and whatnot. That seems to sort out the problem.

@toranb
Copy link

toranb commented Apr 19, 2019

@keathley that worked perfectly! Thanks for the quick reply Chris!

@jedschneider
Copy link

jedschneider commented May 16, 2020

I ran into this and figured out that the coc-elixir language server was preventing the node from coming up.

was getting this in running the tests:

PingPongTest
  * test producer sends pings to each connected nodes consumer (2.4ms)

  1) test producer sends pings to each connected nodes consumer (PingPongTest)
     test/ping_pong_test.exs:27
     ** (exit) :not_alive
     stacktrace:
       (stdlib) slave.erl:197: :slave.start/5
       (local_cluster) lib/local_cluster.ex:50: anonymous fn/2 in LocalCluster.start_nodes/3
       (elixir) lib/enum.ex:1340: anonymous fn/3 in Enum.map/2
       (elixir) lib/enum.ex:3011: Enum.reduce_range_inc/4
       (elixir) lib/enum.ex:1953: Enum.map/2
       (local_cluster) lib/local_cluster.ex:49: LocalCluster.start_nodes/3
       test/ping_pong_test.exs:16: PingPongTest.__ex_unit_setup_0/1
       test/ping_pong_test.exs:1: PingPongTest.__ex_unit__/2

--max-failures reached, aborting test suite

Finished in 0.07 seconds
1 test, 1 failure
╰─ iex -S mix
Erlang/OTP 22 [erts-10.6.4] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [hipe]

Interactive Elixir (1.9.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> LocalCluster.start()

11:19:30.286 [info]  Protocol 'inet_tcp': register/listen error: econnrefused

{:error,
 {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}},
  {:child, :undefined, :net_sup_dynamic,
   {:erl_distribution, :start_link, [[:"manager@127.0.0.1"], false]},
   :permanent, 1000, :supervisor, [:erl_distribution]}}}
iex(2)>

debugger to ensure the gold node can come up

╰─ iex --sname gold
Erlang/OTP 22 [erts-10.6.4] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [hipe]

Interactive Elixir (1.9.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(gold@jeds-mbp)1>

I found an existing beam process and killed it

╰─ ps aux | grep beam
jed              10700   0.0  0.6  6075676 193584   ??  S    11:18AM   7:52.07 /Users/jed/.asdf/installs/erlang/22.2.8/erts-10.6.4/bin/beam.smp -- -root /Users/jed/.asdf/installs/erlang/22.2.8 -progname erl -- -home /Users/jed -- -kernel shell_history enabled -- -pa /Users/jed/.asdf/installs/elixir/1.9.4/bin/../lib/eex/ebin /Users/jed/.asdf/installs/elixir/1.9.4/bin/../lib/elixir/ebin /Users/jed/.asdf/installs/elixir/1.9.4/bin/../lib/ex_unit/ebin /Users/jed/.asdf/installs/elixir/1.9.4/bin/../lib/iex/ebin /Users/jed/.asdf/installs/elixir/1.9.4/bin/../lib/logger/ebin /Users/jed/.asdf/installs/elixir/1.9.4/bin/../lib/mix/ebin -noshell -s elixir start_cli -- -extra -e ElixirLS.LanguageServer.CLI.main()

after that the mix tests ran. But of course, it killed my language server:

[coc.nvim] Did not receive workspace/didChangeConfiguration notification after 5 seconds. Using default settings.

I don't see anything that seems like it would prevent the LocalCluster for coming up (other than the fact that the language server is likely coming up first and the language server must be securing whatever is connecting on inet_tcp. Could we have both the language server and LocalCluster both running if we shared an erlang cookie to connect?. I also hope it might help someone that runs into the same stack trace and can't figure out how to get the test running, as the conflict seems pretty far from the act of running the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants