Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An "option" is received twice #6

Closed
AlexeyRaga opened this issue Oct 19, 2015 · 10 comments
Closed

An "option" is received twice #6

AlexeyRaga opened this issue Oct 19, 2015 · 10 comments

Comments

@AlexeyRaga
Copy link

I am running a distributed pi example on 3 physical nodes where I connect nodes one-by-one to the same "main" node.

What I see is that the 3rd node prints the option message twice. Does it mean that every message is received twice?

Here are the connect process logs:

Node 1 (main, 192.168.1.6)

connecting to: ("192.168.1.6",PortNumber 1111)
Connected to modes: [("192.168.1.6",PortNumber 1111)]
Enter  "start"  to: Start the calculation
Connected to modes: [("192.168.1.153",PortNumber 1111),("192.168.1.6",PortNumber 1111)]
Connected to modes: [("192.168.1.153",PortNumber 1111),("192.168.1.6",PortNumber 1111)]
Connected to modes: [("192.168.1.151",PortNumber 1111),("192.168.1.153",PortNumber 1111),("192.168.1.6",PortNumber 1111)]

Node 2 (192.168.1.153)

Press end to exit
connecting to: ("192.168.1.6",PortNumber 1111)
Enter  "start"  to: Start the calculation

Node 3 (192.168.1.151)

Press end to exit
connecting to: ("192.168.1.6",PortNumber 1111)
Enter  "start"  to: Start the calculation
Enter  "start"  to: Start the calculation
@AlexeyRaga
Copy link
Author

Just run the thing on 3-4-5 machines and was able to confirm that:

  • the "connected" message in the "main" mode happens more times than expected:
Main connected --> 
connecting to: ("192.168.1.6",PortNumber 1111)
Connected to modes: [("192.168.1.6",PortNumber 1111)]
Enter  "start"  to: Start the calculation

Node 1 connected --> 
Connected to modes: [("192.168.1.153",PortNumber 1111),("192.168.1.6",PortNumber 1111)]

Node 2 connected --> 
Connected to modes: [("192.168.1.151",PortNumber 1111),("192.168.1.153",PortNumber 1111),("192.168.1.6",PortNumber 1111)]
Connected to modes: [("192.168.1.151",PortNumber 1111),("192.168.1.153",PortNumber 1111),("192.168.1.6",PortNumber 1111)]

Node 3 connected --> 
Connected to modes: [("192.168.1.151",PortNumber 1111),("192.168.1.153",PortNumber 1111),("192.168.1.6",PortNumber 1111)]
Connected to modes: [("192.168.1.151",PortNumber 1111),("192.168.1.153",PortNumber 1111),("192.168.1.6",PortNumber 1111)]
Connected to modes: [("192.168.1.152",PortNumber 1111),("192.168.1.151",PortNumber 1111),("192.168.1.153",PortNumber 1111),("192.168.1.6",PortNumber 1111)]

So instead of having one connected messages per node each node causes (nodes count) messages.

Maybe because of that each newly connected node reports "start" as many times as the total amount of nodes connected.

I am not sure about the implication, does it mean that there are N listeners per node waiting for the same event now?

@AlexeyRaga
Copy link
Author

Actually, maybe there are actually node number listeners for the same event happening on each node.

I can confirm visually that Node 1 goes faster than Node 2 and faster than Node 3. I can see it from the speed the nodes print on the screen.

@agocorona
Copy link
Collaborator

Thanks Alexey. Looking at it

2015-10-19 11:29 GMT+02:00 Alexey Raga notifications@github.com:

Actually, maybe there are actually node number listeners for the same
event happening on each node.

I can confirm visually that Node 1 goes faster than Node 2 and faster
than Node 3. I can see it from the speed the nodes print on the screen.


Reply to this email directly or view it on GitHub
#6 (comment).

Alberto.

@agocorona
Copy link
Collaborator

Alexey: can you post the code here? there are different examples of pi
distributed in the examples folder and in the School of Haskell tutorial...

2015-10-19 12:12 GMT+02:00 Alberto G. Corona agocorona@gmail.com:

Thanks Alexey. Looking at it

2015-10-19 11:29 GMT+02:00 Alexey Raga notifications@github.com:

Actually, maybe there are actually node number listeners for the same
event happening on each node.

I can confirm visually that Node 1 goes faster than Node 2 and faster
than Node 3. I can see it from the speed the nodes print on the screen.


Reply to this email directly or view it on GitHub
#6 (comment).

Alberto.

Alberto.

@AlexeyRaga
Copy link
Author

Sure, it was based on the one in the examples folder. I slightly changed it in a way that I could pass host:port as parameters where the first one in the list is considered self:

module Main where

import           Control.Monad
import           Control.Monad.IO.Class
import           Data.IORef
import           GHC.Conc
import           System.Environment
import           System.IO
import           System.Random
import           Transient.Base
import           Transient.Indeterminism
import           Transient.Logged
import           Transient.Move
import           Transient.Stream.Resource

main = do
  args  <- getArgs
  let (mainNode : nodes) = node <$> args
      numCalcsNode = 5000


  rresults <- liftIO $ newIORef (0,0)

  keep $ do
    mapM_ (connect mainNode) nodes
    logged $ option  "start"  "Start the calculation"

    r <- clustered $ do
      r <- group numCalcsNode $ do
        n <- liftIO  getNumCapabilities
        threads n $ spawn $ do
          x <- randomIO :: IO Double
          y <- randomIO
          return $ if x * x + y * y < 1 then 1 else (0 :: Int)
      return $ sum r

    (n,c) <- liftIO $ atomicModifyIORef' rresults $ \(num, count) ->
      let num' = num + r
          count'= count + numCalcsNode
      in ((num', count'),(num',count'))

    when ( c `rem` 100000 == 0) $ liftIO $ do
      th <- myThreadId
      putStrLn $ "Samples: " ++ show c ++ " -> " ++
        show( 4.0 * fromIntegral n / fromIntegral c) ++ "\t" ++ show th


node addr = let (h:p:_) = splitOn ':' addr in createNode h (read p)

splitOn delimiter = foldr f [[]]
  where f c l@(x:xs)
          | c == delimiter = []:l
          | otherwise = (c:x):xs

@agocorona
Copy link
Collaborator

The problem is tat connect does connect to the whole cluster. Not to an
individual node.

Try

connect myNode mainNode

Instead of
mapM (connect mainNode) nodes

And get the host and port of both nodes from the command line for each
instance.

I'm away now but I will post here the neccesary changes as soon as I can

The nodes must be executed separately with different node parameters.

You have to put in the command line the host, port of the local node and the host, port of a node already connected to the cluster. The first one can connect to itself.

the, substitute

map (connect mainNode) nodes

for

connect node mainNode

and execute as many times the program as nodes you want to have.

I have created a Transient.Move.Services module that I´m testing now to hot load new nodes in machines, but they need a bootstrap service that must be executed one by one in each node. There is no way to spawn services in remote nodes without first distributing the core infrastructure by hand...

@AlexeyRaga
Copy link
Author

I think it is essentially the same because I always start my nodes by giving them just two parameters. Sorry I wasn't clear about it.

It is like:

#main node
./transient-pi 192.168.1.151:1111 192.168.1.151:1111

#second node
./transient-pi 192.168.1.152:1111 192.168.1.151:1111

#third node
./transient-pi 192.168.1.153:1111 192.168.1.151:1111

So you right, mapM_ is not really needed here, but with two parameters it is essentially the same because nodes is just a list of one node.

Anyway, I've tested it with just connect and it had the same problem.

Below are the changed bits code. I only use two parameters, the first is the "self" node and the second is the "cluster master" node which is always the same:

main = do
  args  <- getArgs
  let (self : master : _) = node <$> args
      numCalcsNode = 5000

  rresults <- liftIO $ newIORef (0,0)

  keep $ do
    connect self master
    logged $ option  "start"  "Start the calculation"

Thanks for your help.

@agocorona
Copy link
Collaborator

It is strange. Even with two nodes, when I "start" in the first node, the second node also display results, and it does not have to. It is something related with connect, since when give the list of nodes manually:

main = do
  args  <- getArgs
  let (local, remote)= if length args > 0 then (createNode "localhost" 2000, createNode "localhost" 2001)
                                     else (createNode "localhost" 2001, createNode "localhost" 2000)
  addNodes [local, remote]

by switching local and remote ports depending on if the command line has an extra argument or not.

In this case they work ok.

It must be something related with connect,which simply should update the nodes list in each node.

@agocorona
Copy link
Collaborator

I found it. connect uses clustered but it has been redefined. Now it return a response for each node, so it forces the execution of the continuation (which has the start option) as much times as there are nodes in the cluster.

It should use mclustered, that gather all the responses and return a single list, using monoid.

But still when a node start, the other nodes also display results. That should not happen, but it seems to be a different issue.

@agocorona
Copy link
Collaborator

I will open a new issue for the extra calculation messages in the "slave" nodes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants