Compile with Emscripten with Optimizations #1

Closed
zz85 opened this Issue Feb 20, 2015 · 14 comments

Projects

None yet

3 participants

@zz85
Owner
zz85 commented Feb 20, 2015

currently, asm.js builds using -O1, -O2, or -O3 with emcc creates binary which hangs in the browser.

need to investigate why this happens.

@zz85 zz85 referenced this issue in kripken/emscripten Feb 20, 2015
Closed

Run Closure with No Optimizations? #3204

@tschw
tschw commented Jul 12, 2015

I experienced hangs caused by the array splitting pass in a C/C++ debug build a few weeks ago. It may not be the only pass that reports false success, though. The release build crashed (GCC 4.8.3 / Linux) and I didn't really feel like digging into that code. Your online demo works surprisingly well compared to my executable - in fact, it looks very promising!

Here is the diff of my "hackfix": The edits further below were to make it compile and please note it's not a proper solution - just stuff that made it run and terminate again somehow.

diff --git a/src/glsl/glsl_optimizer.cpp b/src/glsl/glsl_optimizer.cpp
index 001e7e6..1bf8ce8 100644
--- a/src/glsl/glsl_optimizer.cpp
+++ b/src/glsl/glsl_optimizer.cpp
@@ -197,7 +197,7 @@ struct glslopt_shader

 static inline void debug_print_ir (const char* name, exec_list* ir, _mesa_glsl_parse_state* state, void* memctx)
 {
-   #if 0
+   #if 1
    printf("**** %s:\n", name);
 // _mesa_print_ir (ir, state);
    char* foobar = _mesa_print_ir_glsl(ir, state, ralloc_strdup(memctx, ""), kPrintGlslFragment);
@@ -477,7 +477,7 @@ static void do_optimization_passes(exec_list* ir, bool linked, _mesa_glsl_parse_
        progress2 = lower_vector_insert(ir, false); progress |= progress2; if (progress2) debug_print_ir ("After lower vector insert", ir, state, mem_ctx);
        progress2 = do_swizzle_swizzle(ir); progress |= progress2; if (progress2) debug_print_ir ("After swizzle swizzle", ir, state, mem_ctx);
        progress2 = do_noop_swizzle(ir); progress |= progress2; if (progress2) debug_print_ir ("After noop swizzle", ir, state, mem_ctx);
-       progress2 = optimize_split_arrays(ir, linked, state->metal_target && state->stage == MESA_SHADER_FRAGMENT); progress |= progress2; if (progress2) debug_print_ir ("After split arrays", ir, state, mem_ctx);
+       progress2 = optimize_split_arrays(ir, linked, state->metal_target && state->stage == MESA_SHADER_FRAGMENT); /*progress |= progress2; */ if (progress2) debug_print_ir ("After split arrays", ir, state, mem_ctx);
        progress2 = optimize_split_vectors(ir, linked, OPT_SPLIT_ONLY_UNUSED); progress |= progress2; if (progress2) debug_print_ir("After split unused vectors", ir, state, mem_ctx);
        progress2 = optimize_redundant_jumps(ir); progress |= progress2; if (progress2) debug_print_ir ("After redundant jumps", ir, state, mem_ctx);

diff --git a/src/glsl/main.cpp b/src/glsl/main.cpp
index feed100..be08d9a 100644
--- a/src/glsl/main.cpp
+++ b/src/glsl/main.cpp
@@ -40,11 +40,13 @@

 static int glsl_version = 330;

+/*
 extern "C" void
 _mesa_error_no_memory(const char *caller)
 {
    fprintf(stderr, "Mesa error: out of memory in %s", caller);
 }
+*/

 static void
 initialize_context(struct gl_context *ctx, gl_api api)
diff --git a/src/glsl/standalone_scaffolding.cpp b/src/glsl/standalone_scaffolding.cpp
index b338e92..beaa943 100644
--- a/src/glsl/standalone_scaffolding.cpp
+++ b/src/glsl/standalone_scaffolding.cpp
@@ -54,7 +54,6 @@ _mesa_error_no_memory(const char *caller)
 {
 }

-
 struct gl_shader *
 _mesa_new_shader(struct gl_context *ctx, GLuint name, GLenum type)
 {
diff --git a/tests/glsl_optimizer_tests.cpp b/tests/glsl_optimizer_tests.cpp
index 7dab3d1..a1e0da8 100644
--- a/tests/glsl_optimizer_tests.cpp
+++ b/tests/glsl_optimizer_tests.cpp
@@ -317,7 +317,7 @@ static bool CheckMetal (bool vertex, bool gles, const std::string& testName, con
 {
 #if !GOT_GFX
    return true; // just assume it's ok
-#endif
+#else

    FILE* f = fopen ("metalTemp.metal", "wb");
    fwrite (source.c_str(), source.size(), 1, f);
@@ -330,6 +330,7 @@ static bool CheckMetal (bool vertex, bool gles, const std::string& testName, con
        return false;
    }
    return true;
+#endif
 }
@zz85
Owner
zz85 commented Jul 12, 2015

@tschw interesting find!

to be honest, i haven't played around with the binary builds much myself. did you try compiling with @aras-p's branch at https://github.com/aras-p/glsl-optimizer ? did you try compiling without optimizations (as that's currently how it's working for my emscripten version, i get memory corruption issues with higher optimizations, something i haven't seen in my other emscripten projects...)

what's funny from your patch is that i don't see how that fixes any memory corruption issues. i'll give an attempt to compiler this with -O2 and see though. if it is indeed a bug there, it would be beneficial to have this issue or a PR at the original branch instead and this emscripten version could merge upstream again to get fixes.

btw really interesting work in your glslprep library! :)

@tschw
tschw commented Jul 13, 2015

did you try compiling with @aras-p's branch at https://github.com/aras-p/glsl-optimizer

Yes, it was exactly the branch I was using.

did you try compiling without optimizations

Yes, the debug build was the one that was hanging.

In glsl_optimizer.cpp the optimization passes are run until all of them return false, indicating that no changes were made. If one of them lies, the program hangs: In my (most probably input-dependent) case, optimize_split_arrays returned true without actually transforming the AST and thus got the same input just to do the same, over and over again.

Applying that #if 1 line from the patch turns on trace output, then you can easily see which passes step out of line.

what's funny from your patch is that i don't see how that fixes any memory corruption issues

The release build crashed and did not run at all for me. I did not bother to make a release build with debug info to see the stack.

We use two different compilers, GCC vs. CLang, that generate code differently...

i'll give an attempt to compiler this with -O2

It won't fix memory issues. In fact won't really fix anything. The trace output may hint you at what's hanging, though.

Does -Oz work? It runs pretty fast already, but it's a lot of code...

btw really interesting work in your glslprep library! :)

Thanks!

It seems ideal to tidy up the rather noisy output of the optimizer. I'd love to have both in the Three.js editor. It'd be really great to have a "preprocess only" mode in the optimizer for debugging: I can't really teach the minifier because its preprocessor is AST-based and a parse->generate cycle already minifies the code quite a bit, even without any transformations in effect.

i get memory corruption issues with higher optimizations, something i haven't seen in my other emscripten projects...)

OK, I couldn't resist the challenge. Now, after some debugging, I have a complete understanding of the desease but no cure yet. I got rid of the segfault on my end, but the underlying problem isn't solved and both the vector and array splitting passes report wrong success. It basically comes down to oldschool C tricks meeting modern compilers. Will have to find a smooth way to tell the compiler not to break them...

@tschw
tschw commented Jul 13, 2015

Debugging was hard but the fix turned out to be rather simple:

I sent a patch upstream and filed a pull request for https://github.com/aras-p/glsl-optimizer . Need one too?

@tschw
tschw commented Jul 21, 2015

Upstream is too slow... Would you mind to just

$ git branch resync
$ git pull git@github.com:tschw/glsl-optimizer.git master
$ # [ ... cmake, ninja, test and push ... ]

or so?

@zz85
Owner
zz85 commented Jul 21, 2015

Sounds awesome, sorry I missed this updates. Will try this out later!

On Monday, July 20, 2015, Ben Adams notifications@github.com wrote:

@tschw https://github.com/tschw I've combined the branches:
https://github.com/benaadams/glsl-optimizer


Reply to this email directly or view it on GitHub
#1 (comment).

@zz85
Owner
zz85 commented Jul 22, 2015

@tschw not sure what black magic you did, but this seems to work really well! thanks... will be pushing updated builds soon.

@zz85 zz85 closed this in c630f8e Jul 22, 2015
@zz85
Owner
zz85 commented Jul 22, 2015

Awesome stuff! http://zz85.github.io/glsl-optimizer/ now uses only 555kb with -O3 builds!

screenshot 2015-07-22 03 03 04

@tschw
tschw commented Jul 22, 2015

not sure what black magic you did

It's actually explained in the ticket I sent upstream
https://bugs.freedesktop.org/show_bug.cgi?id=91320

Awesome stuff! http://zz85.github.io/glsl-optimizer/ now uses only 555kb with -O3 builds!

Had hoped for greater savings. Removing all code outside of list.h that is still directly accessing exec_list::head, :tail and :tail_pred and removing the containing struct may help. I'm tempted to try - but it could take a couple of days until I actually get around to.

@zz85
Owner
zz85 commented Jul 23, 2015

It's actually explained in the ticket I sent upstream
https://bugs.freedesktop.org/show_bug.cgi?id=91320

yes, i've read a little of it and i think that's really some crazy stuff that I wouldn't be able to figure out by myself.

Had hoped for greater savings. Removing all code outside of list.h that is still directly accessing exec_list::head, :tail and :tail_pred and removing the containing struct may help. I'm tempted to try - but it could take a couple of days until I actually get around to.

I think the emscripten O2, O3 optimizations turns the C code into asm.js which make it run faster, but not necessary provides a much smaller build apart from minifying. Perhaps other cross compiler like https://github.com/gasman/bonsai-c might do a better job, but I guessing it would be closer to a hand port of C code to JS (with asm.js speed ups).

@tschw
tschw commented Jul 23, 2015

I think the emscripten O2, O3 optimizations turns the C code into asm.js which make it run faster

True. Moving from volatile qualifiers (just telling the compiler to be careful accessing certain values) to typed data structures (telling the compiler what's really going on) removed several kilobytes from the x86_64 binary. I figured there'd be a certain factor for emscripten and less bloat from system libraries compared to x86_64 executables. Basically that theory proved right, but I had hoped for a greater factor.

Half a megabyte is not so bad, actually.

Perhaps other cross compiler like https://github.com/gasman/bonsai-c might do a better job

Interesting project. But I'm afraid an ANSI-C compiler won't do for a C/C++ mix like glsl-optimizer.

@zz85
Owner
zz85 commented Jul 23, 2015

something when someone have the time, I'll be interested to know how Cheerp builds compare to emscripten :)

@tschw
tschw commented Aug 15, 2015

The cleaner the code, the happier whatever compiler :).

Still haven't gotten around to integrating it. The blocking limitation is described in PR 6963 @ Three.js.

@tschw tschw referenced this issue in aras-p/glsl-optimizer Aug 15, 2015
Open

Compiling on Gentoo #92

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment