Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not working with multiple processes #40

Closed
YutingZhang opened this issue Mar 12, 2019 · 13 comments
Closed

Not working with multiple processes #40

YutingZhang opened this issue Mar 12, 2019 · 13 comments
Labels
bug Something isn't working
Projects

Comments

@YutingZhang
Copy link

When calling MobulaOP in a subprocess, it gets stuck.

Environment: lastest mxnet nightly build and Python 3.6.5

An example code modified from dynamic_import_op.py to replicate this error.

from concurrent import futures

import sys
import mxnet as mx

def foo():
    import mobula
    # Import Custom Operator Dynamically
    mobula.op.load('./AdditionOP')
    AdditionOP = mobula.op.AdditionOP

    a = mx.nd.array([1, 2, 3])
    b = mx.nd.array([4, 5, 6])

    a.attach_grad()
    b.attach_grad()

    with mx.autograd.record():
        c = AdditionOP(a, b)

    dc = mx.nd.array([7, 8, 9])
    c.backward(dc)

    assert ((a + b).asnumpy() == c.asnumpy()).all()
    assert (a.grad.asnumpy() == dc.asnumpy()).all()
    assert (b.grad.asnumpy() == dc.asnumpy()).all()

    print('Okay :-)')
    print('a + b = c \n {} + {} = {}'.format(a.asnumpy(), b.asnumpy(), c.asnumpy()))

def main():
    ex = futures.ProcessPoolExecutor(1)
    r = ex.submit(foo)
    r.result()

if __name__ == "__main__":
    main()
@YutingZhang YutingZhang changed the title Not working with multiple process Not working with multiple processes Mar 12, 2019
@wkcn
Copy link
Owner

wkcn commented Mar 12, 2019

Thanks for your report!
I will check it.

@wkcn wkcn added bug Something isn't working threading-safety and removed threading-safety labels Mar 12, 2019
@YutingZhang
Copy link
Author

YutingZhang commented Mar 12, 2019

Thanks!

FYI, If you move import mxnet as mx into foo(), the bug can disappear. But this is generally not doable because mxnet is usually imported in the main process. It may related to how mxnet works with subprocesses.

@wkcn
Copy link
Owner

wkcn commented Mar 12, 2019

moving import mobula and mobula.op.load('./AdditionOP') outside foo() may work, since MobulaOP will register operator into MXNet when mobula.op.load('./AdditionOP') is called.
I will add a check to avoid duplicated register.

@YutingZhang
Copy link
Author

I tried that, but it does not work. Example code:

from concurrent import futures

import sys
import mxnet as mx

import mobula
# Import Custom Operator Dynamically
mobula.op.load('./AdditionOP')

def foo():

    AdditionOP = mobula.op.AdditionOP

    a = mx.nd.array([1, 2, 3])
    b = mx.nd.array([4, 5, 6])

    a.attach_grad()
    b.attach_grad()

    with mx.autograd.record():
        c = AdditionOP(a, b)

    dc = mx.nd.array([7, 8, 9])
    c.backward(dc)

    assert ((a + b).asnumpy() == c.asnumpy()).all()
    assert (a.grad.asnumpy() == dc.asnumpy()).all()
    assert (b.grad.asnumpy() == dc.asnumpy()).all()

    print('Okay :-)')
    print('a + b = c \n {} + {} = {}'.format(a.asnumpy(), b.asnumpy(), c.asnumpy()))

def main():
    ex = futures.ProcessPoolExecutor(1)
    r = ex.submit(foo)
    r.result()

if __name__ == "__main__":
    main()

@wkcn
Copy link
Owner

wkcn commented Mar 12, 2019

@YutingZhang
Hi! I found the bug is not related to MobulaOP.
It seems that MXNet triggers the bug.

from concurrent import futures

import mxnet as mx
import sys
from mobula.testing import assert_almost_equal
sys.path.append('../../')  # Add MobulaOP Path

class AdditionOP(mx.operator.CustomOp):
    def __init__(self):
        super(AdditionOP, self).__init__()
    def forward(self, is_train, req, in_data, out_data, aux):
        out_data[0][:] = in_data[0] + in_data[1]
    def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
        in_grad[0][:] = out_grad[0]
        in_grad[1][:] = out_grad[0]

@mx.operator.register("AdditionOP")
class AdditionOPProp(mx.operator.CustomOpProp):
    def __init__(self):
        super(AdditionOPProp, self).__init__()
    def list_arguments(self):
        return ['a', 'b']
    def list_outputs(self):
        return ['output']
    def infer_shape(self, in_shape):
        return in_shape, [in_shape[0]]
    def create_operator(self, ctx, shapes, dtypes):
        return AdditionOP()

def foo():
    a = mx.nd.array([1, 2, 3])
    b = mx.nd.array([4, 5, 6])

    a.attach_grad()
    b.attach_grad()

    print("REC")
    with mx.autograd.record():
        c = mx.nd.Custom(a, b, op_type='AdditionOP')

    dc = mx.nd.array([7, 8, 9])
    c.backward(dc)

    assert_almost_equal(a + b, c)
    assert_almost_equal(a.grad, dc)
    assert_almost_equal(b.grad, dc)

    print('Okay :-)')
    print('a + b = c \n {} + {} = {}'.format(a.asnumpy(), b.asnumpy(), c.asnumpy()))

def main():
    ex = futures.ProcessPoolExecutor(1)
    r = ex.submit(foo)
    r.result()

if __name__ == '__main__':
    main()

@YutingZhang
Copy link
Author

So mx.nd.Custom is the actual problem ... MxNet just has lots of bugs when running in subprocess ...

@wkcn
Copy link
Owner

wkcn commented Mar 12, 2019

Yes.

@YutingZhang
Copy link
Author

@wkcn Send you an email to your live.cn email :)

@wkcn
Copy link
Owner

wkcn commented Mar 13, 2019

Mail received. Thank you! : )

@wkcn
Copy link
Owner

wkcn commented Aug 12, 2019

Hi @YutingZhang , the two testcases you gave have been passed in the latest MXNet and MobulaOP : )

@YutingZhang
Copy link
Author

@wkcn Thanks a lot! Did you work around the problem in MobulaOP? Or is it due to MxNet's update on CustomOP (you also contributed to this)?

@wkcn
Copy link
Owner

wkcn commented Aug 13, 2019

@YutingZhang It is due to MXNet’s update, and other contributors fixed it.

@wkcn
Copy link
Owner

wkcn commented Sep 25, 2019

Close it since the problem has been addressed. : )

@wkcn wkcn closed this as completed Sep 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Bugs
  
Awaiting triage
Development

No branches or pull requests

2 participants