Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text2PixelArt - radix_sort bug #2

Open
potat-dev opened this issue Aug 18, 2021 · 5 comments
Open

Text2PixelArt - radix_sort bug #2

potat-dev opened this issue Aug 18, 2021 · 5 comments

Comments

@potat-dev
Copy link

I have successfully installed all the required components, but when I run the generation, I get an error:

runtime error:  radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function

I tried to fix the error by downgrading the pytorch library version (1.9 -> 1.8; 1.7), but it didn't help

Here is full error log:

Oops: runtime error:  radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function
Try reducing --num-cuts to save memory
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-54661c00350f> in <module>()
     56 clipit.do_init(settings)
     57 clear_output()
---> 58 clipit.do_run(settings)

/content/clipit/clipit.py in do_run(args)
    900                         print("Oops: runtime error: ", e)
    901                         print("Try reducing --num-cuts to save memory")
--> 902                         raise e
    903         except KeyboardInterrupt:
    904             pass

/content/clipit/clipit.py in do_run(args)
    892                 while True:
    893                     try:
--> 894                         train(args, cur_iteration)
    895                         if cur_iteration == args.iterations:
    896                             break

/content/clipit/clipit.py in train(args, cur_it)
    812 
    813     loss = sum(lossAll)
--> 814     loss.backward()
    815     for opt in opts:
    816         opt.step()

/usr/local/lib/python3.7/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    253                 create_graph=create_graph,
    254                 inputs=inputs)
--> 255         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    256 
    257     def register_hook(self, hook):

/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    147     Variable._execution_engine.run_backward(
    148         tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 149         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
    150 
    151 

/usr/local/lib/python3.7/dist-packages/torch/autograd/function.py in apply(self, *args)
     85     def apply(self, *args):
     86         # _forward_cls is defined by derived class
---> 87         return self._forward_cls.backward(self, *args)  # type: ignore[attr-defined]
     88 
     89 

/usr/local/lib/python3.7/dist-packages/diffvg-0.0.1-py3.7-linux-x86_64.egg/pydiffvg/render_pytorch.py in backward(ctx, grad_img)
    707                       use_prefiltering,
    708                       diffvg.float_ptr(eval_positions.data_ptr()),
--> 709                       eval_positions.shape[0])
    710         time_elapsed = time.time() - start
    711         global print_timing

RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function
@dimithras
Copy link

Hi @DenCoder618, same error here on today's run, though yesterday night the notebook was running fine.

@dimithras
Copy link

Seems like it depends on what GPU is assigned in collab. K80 does not work.

@dimithras
Copy link

dimithras commented Aug 21, 2021

@DenCoder618 I've tested it today, indeed it's K80 problem. To check which GPU is used type !nvidia-smi -L in colab, thanks @tg-bomze for the hint.

It worked for me on T4 and P100 / V100 for @tg-bomze

@DennisGaida
Copy link

DennisGaida commented Aug 26, 2021

Any way to make sure not to get a K80? !nvidia-smi -L returns GPU 0: Tesla K80 (UUID: GPU-adea173e-18bb-0a5a-ac56-ad8ee38d38e0). Collab Pro is the only way?

@potat-dev
Copy link
Author

Collab Pro is the only way?

If you do not have a subscription to Colab Pro, you can simply reset the runtime and if you are lucky, next time a supported video card will drop out. For me it worked several times, but I had to wait for half an hour

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants