Skip to content

Loading…

zeromq v2.2: Assertion failed using epgm #19

Closed
jimenezrick opened this Issue · 16 comments

3 participants

@jimenezrick

Using the bindings for zeromq 2.2, I get the next error: Assertion failed: false (zmq.cpp:304)
It just happens using epgm as transport. Is this supposed to occur?

Here it goes the test code:

module Main where

import System.Exit
import System.Environment
import qualified System.ZMQ as ZMQ
import qualified Data.ByteString.Char8 as BS

type Endpoint = String

main :: IO ()
main = do
    args <- getArgs
    case args of
      {-["-p"] -> runZMQ 1 ZMQ.Pub $ pub "tcp://127.0.0.1:8000"-}
      {-["-s"] -> runZMQ 1 ZMQ.Sub $ sub "tcp://127.0.0.1:8000"-}
      ["-p"] -> runZMQ 1 ZMQ.Pub $ pub "epgm://eth0;239.192.1.1:5555"
      ["-s"] -> runZMQ 1 ZMQ.Sub $ sub "epgm://eth0;239.192.1.1:5555"
      _      -> usage

usage :: IO ()
usage = do
    prog <- getProgName
    putStrLn $ "Usage: " ++ prog ++ " -p | -s"
    exitFailure

runZMQ :: ZMQ.SType a => ZMQ.Size -> a -> (ZMQ.Socket a -> IO b) -> IO b
runZMQ s t f = ZMQ.withContext s (\c -> ZMQ.withSocket c t f)

pub :: Endpoint -> ZMQ.Socket a -> IO ()
pub e s = do
    ZMQ.connect s e
    ZMQ.send s (BS.pack "Hello") []

sub :: ZMQ.SubsType a => Endpoint -> ZMQ.Socket a -> IO ()
sub e s = do
    ZMQ.subscribe s ""
    ZMQ.bind s e
    m <- ZMQ.receive s []
    putStrLn $ BS.unpack m

Thanks in advance.

@jimenezrick

I did the same test from C and works correctly. I have found out that the errors are triggered when trying to publish when doesn't exist a subscriber. So running only the subscriber several times you can get some random errors when trying to send something through the socket:

Warn: file socket.c: line 229 (pgm_close): should not be reached
Warn: file socket.c: line 1075 (pgm_setsockopt): should not be reached
Warn: file socket.c: line 230 (pgm_close): assertion `!sock->is_destroyed' failed
Assertion failed: rc == 0 (connect_session.cpp:84)
Aborted
Warn: file source.c: line 1813 (pgm_send): should not be reached
Assertion failed: status == PGM_IO_STATUS_RATE_LIMITED || status == PGM_IO_STATUS_WOULD_BLOCK (pgm_socket.cpp:473)
Aborted
Assertion failed: rc (pgm_socket.cpp:498)
Aborted

Could be related somehow to some bug in the finalization code of the context/socket?

@jimenezrick

I couldn't try with zeromq3-haskell as the bundled OpenPGM version that comes with it has some bugs fixed on recent revisions from the SVN.

@twittner
Owner

Could you please run your program with +RTS -V0 -RTS and tell me if the problem persists?

@jimenezrick

No, with -V0 the issue disappears.

@twittner
Owner

Thanks. Could you please checkout commit c537f77 and see if it passes your tests?

@jimenezrick

Yes, that fixes the issue under normal operation. The test program doesn't show anymore failed assertions.
But these same assertions appear sometimes when the program is interrupted by Ctrl-C :-/

Thanks for your effort :-)

@twittner
Owner

Thanks. The underlying problem is with openpgm's pgm_shutdown function which is not reentrant. If zmq_term is interrupted it is restarted and pgm_shutdown may be invoked twice and then fails the second time. I will report this as a bug to 0MQ. According to http://api.zeromq.org/2-2:zmq-term restarting zmq_term is safe, but not when used with openpgm. In the meantime I block all signals during zmq_term. Could you please retest b20ae7c.

@jimenezrick

With b20ae7c works perfectly! @twittner thanks again for your work :-)

@jimenezrick

Don't forget to report this bug to ZMQ people, seems something necessary to fix.

@jimenezrick

I just opened an issue on the ZMQ bugtracker to let the ZMQ team have the issue registered: https://zeromq.jira.com/browse/LIBZMQ-404
If you agree, we should let this issue open here until the ZMQ team fixes the matter.

@jimenezrick

The fix is already included on 3.x trunk and pending merge of the pull request on 2.x. With ZMQ 2.x I can confirm that now the binding works correctly without the cbits.c hack.

Trying to test the binding with 3.x, because I have some issue with that version.

@jimenezrick

I can't test with ZMQ 3.x because of some unrelated issues with PGM on that release. But it should work fine. I'm going to paste here the test adapted for ZMQ3 to save it somewhere for when that issues are resolved:

module Main where

import System.Exit
import System.Environment
import qualified System.ZMQ3 as ZMQ 
import qualified Data.ByteString.Char8 as BS

type Endpoint = String

main :: IO ()
main = do
    args <- getArgs
    case args of
      {-["-p"] -> runZMQ 1 ZMQ.Pub $ pub "tcp://127.0.0.1:8000"-}
      {-["-s"] -> runZMQ 1 ZMQ.Sub $ sub "tcp://127.0.0.1:8000"-}
      ["-p"] -> runZMQ 1 ZMQ.Pub $ pub "epgm://eth0;239.192.1.1:5555"
      ["-s"] -> runZMQ 1 ZMQ.Sub $ sub "epgm://eth0;239.192.1.1:5555"
      _      -> usage

usage :: IO ()
usage = do
    prog <- getProgName
    putStrLn $ "Usage: " ++ prog ++ " -p | -s"
    exitFailure

runZMQ :: ZMQ.SocketType a => ZMQ.Size -> a -> (ZMQ.Socket a -> IO b) -> IO b
runZMQ s t f = ZMQ.withContext s (\c -> ZMQ.withSocket c t f)

pub :: ZMQ.Sender a => Endpoint -> ZMQ.Socket a -> IO ()
pub e s = do
    ZMQ.connect s e 
    ZMQ.send s [] (BS.pack "Hello")

sub :: (ZMQ.Receiver a, ZMQ.Subscriber a) => Endpoint -> ZMQ.Socket a -> IO ()
sub e s = do
    ZMQ.subscribe s ""
    ZMQ.bind s e 
    m <- ZMQ.receive s
    putStrLn $ BS.unpack m
@joell

I was also having this problem with the 0.8.4 Hackage zeromq-haskell package and discovered this thread. (We're using ZeroMQ 2.2.) The existing solution in the zeromq-2-1 branch partially resolved my issues, though I then had trouble with sending a second SIGTERM to a ZeroMQ daemon lingering on sockets during zmq_term. I have sent a pull request for a modification that does not mask certain termination and extreme-scenario signals which would ideally interrupt the zmq_term call.

Also, when we get this resolved it would be great to have a new Hackage release of the zeromq-haskell package including these changes.

@twittner
Owner

I would prefer to roll back the temporary fixes because 0MQ already fixed the bug in their codebase thanks to @jimenezrick 's efforts. Could you try git://github.com/zeromq/zeromq2-x.git and see if it works for you?

@joell
@twittner
Owner

Closing this issue as 0MQ has fixed the underlying bug.

@twittner twittner closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.