Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vernemq cluster #186

Closed
jshahs opened this issue Aug 16, 2016 · 45 comments
Closed

vernemq cluster #186

jshahs opened this issue Aug 16, 2016 · 45 comments

Comments

@jshahs
Copy link

jshahs commented Aug 16, 2016

Hi,

We are running cluster with two vernemq nodes with the below configurations

trade_consistency=on
allow_multiple_sessions=off

All the clients connect with clear_session = false

Question : When one vernemq node in the cluster is down then the other node is not accepting the connections.Could you please let me know the reason?(Is it due to offline message store on the down node)

@dergraf
Copy link
Contributor

dergraf commented Aug 16, 2016

Hi!

This is the intended behavior.

From the 'online' node perspective a network partition occurred, as it doesn't know the reason why the second node disappeared. If it is a correct shutdown you should always use the vmq-admin cluster leave node=<SecondNode> flow (explained here https://vernemq.com/docs/clustering/).

If it is indeed a network partition or a node failure, or you want to simulate one of the two we recommend to have a look at our section on Netsplits here. https://vernemq.com/docs/clustering/netsplits.html

What it essentially means, that during such a 'faulty' situation with trade_consistency=on the remaining node only accepts PUBLISH, SUBSCRIBE, UNSUBSCRIBE requests. In this case CONNECTs are rejected.
However, using allow_multiple_sessions=on goes one step further and even allows CONNECTs.

We are aware of the situation that allow_multiple_sessions is probably the wrong way to deal with this. In one of the upcoming releases, we'll introduce a third parameter which would allow CONNECTs during a partition but doesn't require to allow multiple sessions sharing a queue.

Hope this helps.

@dergraf dergraf closed this as completed Aug 16, 2016
@vladbabii
Copy link

So, in a 2-node cluster, with both

trade_consistency = on
allow_multiple_sessions = on

I should be able to connect to first node if second one goes down ?

I'm only getting Connection Refused: broker unavailable.

@ioolkos
Copy link
Contributor

ioolkos commented Aug 18, 2016

Thanks @vladbabii
Yes, the client should be able to re-connect to the remaining node. How do you test this exactly?

@vladbabii
Copy link

vladbabii commented Aug 18, 2016

2 Raspberry PI, each with vernemq started, both in the same cluster.
Clients (arduino, esp8266, mosquitto, nodejs, php) can connect to any of them while both are up without any issues. Once one goes down no new connections are possible.
Vernemq was compiled from source last week.

I've then installed vernemq in an openvz container running ubuntu using packages and using the same configuration it seems to run fine with both nodes up and with only one up (accepting new connections and data).

Now i'm wondering if it's a recent issue or if it's a bug because of the environment.

@ioolkos
Copy link
Contributor

ioolkos commented Aug 18, 2016

OK, that is interesting.
Can you verify both vernemq.config files on the Raspis? Is there a listener set up for both of them? both allow multiple sessions?

@vladbabii
Copy link

Head of config on both of them

Since they joined the cluster and i can publish on first node and subscribe on second and it works fine.
But once I stop one vernemq instance / shutdown a Pi, no new connections are allowed.

## Allow anonymous users to connect, default is 'off'
##
## Default: off
##
## Acceptable values:
##   - on or off
allow_anonymous = on

## Allow operation even when a VerneMQ cluster is inconsistent,
## by removing consistency checks while registering new
## clients or subscribing/unsubscribing from topics
##
## Default: off
##
## Acceptable values:
##   - on or off
trade_consistency = on

## Allows a client to logon multiple times using the same client id
## (non-standard behaviour!).
##
## Default: off
##
## Acceptable values:
##   - on or off
allow_multiple_sessions = on

@ioolkos
Copy link
Contributor

ioolkos commented Aug 18, 2016

I'll try to find out in the afternoon if we can reproduce this on our Raspberry-Cluster.

@vladbabii
Copy link

Can i help with anything ? Maybe recompiling vernemq from head?

@vladbabii
Copy link

Environment is:
Raspberry pi v3, raspbian lite os,
Erlang/OTP 18 [erts-7.0] [source] [smp:4:4] [async-threads:10] [kernel-poll:false]

vernemq version returns a new line (no text)

@ioolkos
Copy link
Contributor

ioolkos commented Aug 18, 2016

One more thing: have you restarted the nodes after the config changes to their vernemq.conf?

@jshahs
Copy link
Author

jshahs commented Aug 18, 2016

Hi @vladbabii ,I tested this scenario with old release,It was working fine(accepting new connections also) if we use 'allow_multiple_sessions = on'.

Hi @dergraf ,

Thank you for your response,

I am looking for a way such that if one node in the cluster goes down then other nodes should not have any effect.Based on the vernemq implementation,the only dependency with the down node is offline message store.If I move the offline message store to external database which is available for all the nodes in the cluster then I think,It would solve this down node dependency in cluster.

Could you please let me know your comments on this.

Thank you in advance..

@vladbabii
Copy link

Thank you for taking the time @jshahs
I did

  • vernemq stop
  • edited the vernemq.conf files
  • rm -rf /folder/with/temporary/generated/config
  • vernemq start

Tried multiple times with reboots between.

Do you have a guide on how to compile vernemq on PI ? Maybe i did something wrong since it's the first time i did something with erlang.

@ioolkos
Copy link
Contributor

ioolkos commented Aug 18, 2016

Here is a way to compile on Raspberry 2. What did you actually do on the Raspi3? is this 32bit or 64bit
#90 (comment)

We just decided to build a Raspi profile to make this easier (something we wanted to do for a while) -> @larshesel

Note: no need to rm the generated configs before restart. Depending on previous state, you might want to delete the data folders though.

@vladbabii
Copy link

vladbabii commented Aug 18, 2016

Followed steps from
#175 (comment)
bougueil/eleveldb@bca0888#commitcomment-18574965

to compile on both 2 and 3

@vladbabii
Copy link

I think its 32bit since it fails at
if (8_1024_1024*1024L < gCurrentTotalMemory)

@vladbabii
Copy link

@ioolkos - there does not seem to be a vernemq branch in leveldb - https://github.com/erlio/eleveldb/branches/all?utf8=%E2%9C%93&query=vernemq

@larshesel
Copy link
Contributor

Hi @vladbabii - we've just pushed some changes to the master branch which should make it easier to build on raspberry pi as it should be able to build out of the box.

All you have to do is to check out the latest master and do make rpi32 in the vernemq folder and it should create a release in the _build/rpi32/rel folder and you'd be able to start vernemq using _build/rpi32/rel/vernemq/bin/vernemq start.

Any feedback is appreciated.

@frantill
Copy link

Hi!

I followed the steps described by @larshesel in the previous comment. I guess it worked fine, it compiled and I got the new _build/rpi32/rel folder but somehow cannot start vernmq. I had the /bin to PATH but when I type vernemq start i get the following error:

vm.args needs to have a -name parameter. -sname is not supported.

I am working on 2 raspberry pi 3, got the same error from both.
I have erl version 17 and java 1.8.

Any clue what this error is related with?

Thank you in advance for your help!

@vladbabii
Copy link

I'll try compiling it now and see if it works.

@vladbabii
Copy link

I did this:

cd <some empty dir>
git clone git://github.com/erlio/vernemq.git
cd vernemq
make rpi32
cd /_build/rpi32/rel/vernemq/bin
chmod +x vernemq
./vernemq start

checked it started with

ps aux | grep vernemq

also with

./vernemq ping
pong

it seems to start

@vladbabii
Copy link

to configure it, modify

vernemq/_build/rpi32/rel/vernemq/etc/vernemq.conf

I modified port to 11883 and trade consistency and anonymous to on

Individual nodes work just fine.

@vladbabii
Copy link

Also clustering works

./vmq-admin cluster status
+--------------------+-------+
|        Node        |Running|
+--------------------+-------+
|VerneMQ@a.b.c.1| true  |
|VerneMQ@a.b.c.2| true  |
+--------------------+-------+

@vladbabii
Copy link

Also on the page at https://vernemq.com/docs/clustering/netsplits.html there's a typo (it should be _sessions not _session)
+ allow_multiple_session=on: Partitioned VerneMQ cluster accepts all request (non MQTT-standard behaviour)

Tested clustering and it works fine also.

git show
commit 0adc3753d3ec8d66c9b8ec02595e9bfc228ca4a6
Author: ioolkos <afa@erl.io>
Date:   Thu Oct 6 15:48:42 2016 +0200

erl --version
Erlang/OTP 18 [erts-7.0] [source] [smp:4:4] [async-threads:10] [kernel-poll:false]
Eshell V7.0  (abort with ^G)

on Raspberry Pi V3 / jessie lite
uname -a
Linux 4.4.21-v7+ #911 SMP Thu Sep 15 14:22:38 BST 2016 armv7l GNU/Linux

@vladbabii
Copy link

Also works with

Erlang/OTP 17 [erts-6.2] [source] [smp:4:4] [async-threads:10] [kernel-poll:false]

If you do first apt-get upgrade && apt-get install erlang libssl-dev

@frantill
Copy link

Ok, Thank you very much @vladbabii for all the information. Now it is much better, but still cannot start it because:

vernemq failed to start within 15 seconds,
see the output of 'vernemq console' for more information.
If you want to wait longer, set the environment variable
WAIT_FOR_ERLANG to the number of seconds to wait.

Will keep working on it tomorrow and will report here my results/solutions.
Thank you again!

@vladbabii
Copy link

If you update stuff you need to recompile.

@vladbabii
Copy link

Also make sure ports are not used already

@vladbabii
Copy link

Installing erlang 18 on pi

apt-get install wget libssl-dev ncurses-dev m4 unixodbc-dev erlang-dev
 wget http://www.erlang.org/download/otp_src_18.0.tar.gz
tar -xzvf otp_src_18.0.tar.gz
cd otp_src_18.0
./configure
make 
make install

@frantill
Copy link

Hi @vladbabii,

So, now vernemq is working, meaning I get a pong back when I ping it!
While installing erlang 18 on the second Pi i got this:

screenshot 2016-10-13 12 35 47

I will keep working on it. Want to send messages to the broker now

Then, is the vmq_mzbench tool supported/working on the Pi? couldn't make it as well
thanks

@vladbabii
Copy link

I would not worry about the missing documentation if you don't need it. As far as i know (and i'm not very good with erlang) missing docs should not implede vernemq usage in any way...

@frantill
Copy link

Ehm...I don't really know much about erlang either. I am looking for a working mqtt broker for raspberry (different from mosquitto)...and I was looking for the benchmark tool.
Will post again if things evolve
Thanks for your help

@ioolkos
Copy link
Contributor

ioolkos commented Oct 13, 2016

@frantill great to see you're making progress!
A note on vmq_mzbench: this is used to load test brokers, so in general it shouldn't be operated on the same systems as your broker. If VerneMQ is on a RPi, you can install MZBench on your laptop, and then loadtest from your laptop -> RPi.

@frantill
Copy link

I see... I have still a long way to go:)
Thanks for your tip. Will keep reading and studying about it.

@frantill
Copy link

frantill commented Oct 17, 2016

Hi @ioolkos, I am now in the situation you suggested, mzbench on the mac, vernemq on the raspberry pi.
Trying to follow https://vernemq.com/blog/2016/08/26/loadtesting-mqtt-brokers.html but couldn't make it.
I tried to change the "connect" parameters:
connect([t(host, "127.0.0.1")
changed both host and 127.0.0.1 (tried different combination in order to direct the test from my laptop to the RPi, sorry for my brute method) but anyway I still see 127.0.0.1 appearing in the syst.log error when checking the mzbench localhost:4800.
Thank you for helping!

@ioolkos
Copy link
Contributor

ioolkos commented Oct 17, 2016

@frantill apologies I couldn't get earlier on this.
I don't quite understand what you do from your description. Setting host to 127.0.0.1 will always try to connect to localhost, ie your laptop. How have you connected the RPI? (under what address is it reachable in your local network?).
Can you ping it? can you telnet to the 1883 mqtt port on the RPI from your laptop?

@frantill
Copy link

@ioolkos thanks,
what I can tell you now:

laptop_IP: 192.168.1.220
RPi_IP: 192.168.1.204

running netstat -an | grep 1883 gives back
tcp 0 0 127.0.0.1:1883 0.0.0.0:* LISTEN

only change in vernemq.config is allow_anonymous = on

telnet 127.0.0.1 1883 says:
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Connection closed by foreign host.

./vernemq ping gives back pong

@ioolkos
Copy link
Contributor

ioolkos commented Oct 17, 2016

You need to start the listener in vernemq.conf with the actual external IP, not 127.0.0.1, if you want to reach it from anywhere other than localhost itself.

@frantill
Copy link

I did change the listener IP address before when I was trying to run the lead test scenario...but had no success. will try again, thank you

@frantill
Copy link

frantill commented Oct 20, 2016

Hi, my problems come from installing/executing vmc_mzbench.
I am working on a Mac, MZbench is running (can access http://localhost:4800 ) but i get the following error when compiling vmc_mzbench:

sudo ./rebar compile
==> goldrush (compile)
Compiled src/glc.erl
Compiled src/glc_lib.erl
Compiled src/glc_code.erl
Compiled src/gr_app.erl
Compiled src/gr_context.erl
Compiled src/glc_ops.erl
Compiled src/gr_counter_sup.erl
Compiled src/gr_manager_sup.erl
Compiled src/gr_manager.erl
Compiled src/gr_param_sup.erl
Compiled src/gr_counter.erl
Compiled src/gr_sup.erl
Compiled src/gr_param.erl
Compiled src/gre.erl
==> lager (compile)
Compiled src/lager_util.erl
Compiled src/lager_transform.erl
Compiled src/lager_app.erl
Compiled src/lager_backend_throttle.erl
Compiled src/lager.erl
Compiled src/lager_common_test_backend.erl
Compiled src/lager_config.erl
Compiled src/error_logger_lager_h.erl
Compiled src/lager_console_backend.erl
Compiled src/lager_crash_log.erl
Compiled src/lager_default_formatter.erl
Compiled src/lager_handler_watcher.erl
Compiled src/lager_handler_watcher_sup.erl
Compiled src/lager_msg.erl
Compiled src/lager_format.erl
Compiled src/lager_file_backend.erl
Compiled src/lager_sup.erl
Compiled src/lager_stdlib.erl
Compiled src/lager_trunc_io.erl
==> vmq_commons (compile)
/Users/fs/mzb/vmq_mzbench/deps/vmq_commons/src/auth_on_publish_hook.erl:none: error in parse transform 'lager_transform': {function_clause,
[{lager_transform,
'-walk_ast/2-fun-0-',
[{typed_record_field,
{record_field,34,
{atom,34,proto_ver}},
{user_type,34,proto_version,
[]}}],
[{file,
"src/lager_transform.erl"},
{line,60}]},
{lists,map,2,
[{file,"lists.erl"},
{line,1239}]},
{lager_transform,walk_ast,2,
[{file,
"src/lager_transform.erl"},
{line,60}]},
{compile,
'-foldl_transform/2-anonymous-2-',
2,
[{file,"compile.erl"},
{line,964}]},
{compile,foldl_transform,2,
[{file,"compile.erl"},
{line,966}]},
{compile,
'-internal_comp/4-anonymous-1-',
2,
[{file,"compile.erl"},
{line,321}]},
{compile,fold_comp,3,
[{file,"compile.erl"},
{line,347}]},
{compile,internal_comp,4,
[{file,"compile.erl"},
{line,331}]}]}
ERROR: compile failed while processing /Users/fs/mzb/vmq_mzbench/deps/vmq_commons: rebar_abort

erl -V gives me back
Erlang/OTP 19 [erts-8.1] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]

Any known advice for this situation? thank you

@larshesel
Copy link
Contributor

Can you try latest master of mzbench - it may be ready for erl19 - if that doesn't work then you'd probably need to use erl18 for compiling and using mzbench.

@frantill
Copy link

Update:
Yes, vmq_mzbench can be compile and run with erl18.
I managed to have different erl version on mi laptop with this tool: https://github.com/kerl/kerl
Thank you.

As I said earlier, vernemq on my raspberry seems to accept connections only for 4-5 seconds, then it closes them.

laptop --> rpi:
$ telnet 192.168.1.204 1883 Trying 192.168.1.204... Connected to rpi3_b. Escape character is '^]'. Connection closed by foreign host.

on rpi itself:
'$ telnet 127.0.0.1 1883
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Connection closed by foreign host.'

It also happens starting a scenario with 100/1000/n clients, I run netsta -tulanp and I can see all the client connections ESTABLISHED for just few seconds, then they go in TIME_WAIT.
It happens also publishing/subscribing to the default topic bar.

In verne.config
allow_anonymous = on
trade_consistency = on
listener.tcp.all = 0.0.0.0:1883

Is there any parameter that can trigger this behaviour? or some configuration I am missing?

@ioolkos
Copy link
Contributor

ioolkos commented Oct 22, 2016

OK, thanks.
Glad you found kerl, I use it too.

For telnet it's clear that the server closes the connection, as you don't send a proper CONNACK request.
Use telnet to probe for a listener, that's fine.

Now let's find out why VerneMQ closes the mzbench connections . As your listener config looks OK, I think it might be the way you set up the connections. Can you show me the part of your script where you set up the connections?

@frantill
Copy link

@ioolkos, thanks, I started from a simple case and I have been able to run some simple scenario on the RPi! Thank you for helping.
Next step is probably to go to cloud with AWS.

@python225
Copy link

Hi,

I am implementing vernemq cluster with docker & kubernetes. for that I created a vernemq image and written yaml file for deployment. I deployed that yaml file then I have one vernemq cluster.

If I want to create a vernemq cluster I will create a oone more image in that image I will tell to next brokers in my cluster go & join the vernemq1. I created a cluster with respective to the 1st broker.

Issue: Now I want to scale up the cluster with respective to vernemq1. But dude to some reason my 1st vernemq-broker get deleted then how can add vernemq-brokers to the same cluster?

@ioolkos
Copy link
Contributor

ioolkos commented Apr 5, 2017

Hi @python225 thanks for your report!
I'm not sure I understand your description, but maybe in rephrasing or some more specifics we can tackle it together?
For example, how is the first vernemq-broker even "deleted"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants