Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't import taichi (from 0.5.15) using Ubuntu 20.04 in VM #958

Closed
zhai-xiao opened this issue May 12, 2020 · 18 comments · Fixed by #1326 or #1508
Closed

Can't import taichi (from 0.5.15) using Ubuntu 20.04 in VM #958

zhai-xiao opened this issue May 12, 2020 · 18 comments · Fixed by #1326 or #1508
Assignees
Labels
linux Linux platform opengl OpenGL backend potential bug Something that looks like a bug but not yet confirmed

Comments

@zhai-xiao
Copy link
Contributor

Taichi fail to load in Ubuntu 20.04 using VM since 0.5.15, tried with both VMware Player and VirtualBox. Version 0.5.14 and before are fine.

image

image

@zhai-xiao zhai-xiao added the potential bug Something that looks like a bug but not yet confirmed label May 12, 2020
@archibate archibate self-assigned this May 12, 2020
@archibate
Copy link
Collaborator

archibate commented May 12, 2020

Thank for the patient! This issue is because of the recent added OpenGL backend in v0.5.15.
It's heard that VM does not have good OpenGL support. Could you run glxinfo | grep OpenGL and verify that? And, do you have the same problem on your physical machine (not in VM)?

@archibate archibate added opengl OpenGL backend dependency linux Linux platform labels May 12, 2020
@zhai-xiao
Copy link
Contributor Author

Sure, VMs are not famous for their OpenGL compatibility. I got 3.3 on VMware but only 2.1 in VirtualBox. Sadly I don't have a proper environment on my physical machine. Could you please don't assume a gl version and make it work with CPU only, just like before? The glxinfo from VMs are attached

image

image

@archibate
Copy link
Collaborator

archibate commented May 12, 2020

I got 3.3 on VMware but only 2.1 in VirtualBox.

While Taichi requires 4.3 to work.
We should detect the version before calling into glfwCreateWindow and return false at that situation:

bool is_opengl_api_available() {
return initialize_opengl(true);
}

But the problem comes:
We need an OpenGL context to call glGetString(GL_VERSION).
We need glfwCreateWindow to get an OpenGL context.
We need glGetString(GL_VERSION) to determine weather to call glfwCreateWindow.
https://stackoverflow.com/questions/46510889/how-can-i-know-which-opengl-version-is-supported-by-my-system

Could you please don't assume a gl version and make it work with CPU only, just like before?

Possible temporary solution: remove L455-L456 from ~/.local/lib/python3.8/site-packages/taichi/core/util.py:

    if ti_core.with_opengl():
        supported_archs.append('opengl')

A related issue: glfw/glfw#766

@zhai-xiao
Copy link
Contributor Author

Thanks for the reply. I know it probably requires the compute shader for OpenGL to really shine. However, having OpenGL 4.3 is almost impossible currently for most VMs at my best knowledge, so I'd like to fall back on x64 for now.
Removing L455-456 in util.py fixes the line 'import taichi as ti', but when I do 'ti.init(arch=ti.x64)', I still get pretty much the same error. In the callstack I can still see OpenGL being initialized. I'm not quite sure what's going on behind the scene but it seems that ti_core.with_opengl() is still true even if arch=ti.x64 is passed in.

image

@archibate
Copy link
Collaborator

archibate commented May 12, 2020

Thank for the information, I found another with_opengl in L271-L272 from ~/.local/lib/python3.8/site-packages/taichi/lang/__init__.py:

if ti_core.with_opengl():
        archs.append(opengl)

but it seems that ti_core.with_opengl() is still true even if arch=ti.x64 is passed in.

Note that with_opengl is no expected to return false with arch=ti.x64 specified, it basically detects if the OpenGL driver is available, and return false only when driver unavailable, instead of a manual specifed arch.
However, with_opengl crashed into segment fault when detecting OpenGL availability...
It would be straightforward if we can catch that SIGSEGV, and return false on that condition.
python-pseudo code:

def with_opengl():
   try:
      return initialize_opengl()
   except SegmentFault:
      return False

void signal_handler(int signo) {
// It seems that there's no way to pass exception to Python in signal
// handlers?
auto sig_name = signal_name(signo);
logger.error(fmt::format("Received signal {} ({})", signo, sig_name), false);
exit(-1);

https://docs.python.org/3/library/faulthandler.html#module-faulthandler

This also applies to CUDA backend, which is commonly reported to be crash on start up (@yuanming-hu), what do you think?


Btw, you can run TI_LOG_LEVEL=trace python test.py to print more details about the internal process.

@k-ye
Copy link
Member

k-ye commented May 12, 2020

BTW, is it possible to detect OpenGL version inside with_opengl(), something like this? Report true only if version >= 4.3?

@archibate
Copy link
Collaborator

BTW, is it possible to detect OpenGL version inside with_opengl(), something like this? Report true only if version >= 4.3?

Thank for the suggestion, I hope so, but we can't call glGetInteger before glfwCreateWindow. I think we will stick to the catch-segmentation-fault approach, which is also helpful for CUDA.

We need an OpenGL context to call glGetString(GL_VERSION).
We need glfwCreateWindow to get an OpenGL context.
We need glGetString(GL_VERSION) to determine weather to call glfwCreateWindow (cause segfault)

@yuanming-hu
Copy link
Member

A probably easier solution: can we simply have a .taichiconfig to disable OpenGL manually in certain environments?

@archibate
Copy link
Collaborator

Yes, we can, if you mean, users without environment manually add TI_WITH_OPENGL=0 in .bashrc?

@yuanming-hu
Copy link
Member

Oh, making use of environment variables does sound like a good solution for *nix users! Let's use something like TI_ENABLE_OPENGL?

We should also consider how to make taichi work out-of-box without setting anything like an envvar. On the other hand, we don't want to set TI_ENABLE_OPENGL=0 by default. Do you have an idea on how to achieve both?

@zhai-xiao
Copy link
Contributor Author

Thanks for all the timely replies. You guys are amazing!

@archibate
Copy link
Collaborator

archibate commented May 13, 2020

You're welcome, thank you for pointing out the bug and valuable informations!

@yuanming-hu Can we release #962 with v0.6.4 tonight? So that @TroyZhai could try out TI_ENABLE_OPENGL=0 and see if it works.
Also note that this is a temporary solution given that it's hard to figure out why. We must find out an ultimate solution for this issue at some point.

@yuanming-hu
Copy link
Member

Sure - I have meetings in the morning but I'll release v0.6.4 in a couple of hours.

@yuanming-hu
Copy link
Member

@TroyZhai We just now released v0.6.4. When you get a chance, could you upgrade and run with TI_ENABLE_OPENGL=0? Please let us know if that works.

No rush on this at all. Thank you!

@zhai-xiao
Copy link
Contributor Author

@TroyZhai We just now released v0.6.4. When you get a chance, could you upgrade and run with TI_ENABLE_OPENGL=0? Please let us know if that works.

No rush on this at all. Thank you!

Great news! I can confirm that it works as expected on my VMs when I set "export TI_ENABLE_OPENGL=0". Thanks all!

@yuanming-hu
Copy link
Member

Awesome!

I'm closing this thanks to the hard work by @archibate.

@archibate
Copy link
Collaborator

Cool! But how about to add this usage to doc? Potentially a chapter called Troubleshooting, contains TI_ENABLE_OPENGL and TI_USE_UNIFIED_MEMORY, etc., so that these will solve more people's problem.

@yuanming-hu
Copy link
Member

Sounds good! Should we mode the following items in the README file there as well?

  • On Ubuntu 19.04+, please sudo apt install libtinfo5.
  • On Windows, please install Microsoft Visual C++ Redistributable if you haven't.

A chapter named Installation sounds good. We can address all compatibility issues there. Maybe we can put it before Hello world?

These text in Hello world should also be moved there:

First of all, let’s install Taichi via pip:

# Python 3.6+ needed
python3 -m pip install taichi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linux Linux platform opengl OpenGL backend potential bug Something that looks like a bug but not yet confirmed
Projects
None yet
4 participants