Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add brotli compression #520

Closed
pkkummermo opened this issue Mar 25, 2019 · 34 comments
Closed

Add brotli compression #520

pkkummermo opened this issue Mar 25, 2019 · 34 comments

Comments

@pkkummermo
Copy link

pkkummermo commented Mar 25, 2019

Describe the feature
Add brotli compression to Javalin.

Additional context
We should probably also rename the current configuration (.disableDynamicGzip()) to something along the lines of

dynamicCompression: true
@tipsy
Copy link
Member

tipsy commented Mar 25, 2019

Last time I checked brotli was not suited for dynamic compression. There is a pretty recent comment on the jetty tracker about it: jetty/jetty.project#2553.

I don't think this is worth the implementation effort and additional dependency.

@pkkummermo
Copy link
Author

Might be too heavy as you say, but would be nice to experiment with it and see if it's worth it. It's not big gains,

14% smaller than gzip for JavaScript
21% smaller than gzip for HTML
17% smaller than gzip for CSS

but if we could stay "ahead of the curve" (or at least offer the ability to do so easily) it could serve some value :)

@pkkummermo
Copy link
Author

The link (https://blogs.akamai.com/2016/02/understanding-brotlis-potential.html) in the issue you mentioned actually talks a lot about the performance and some tweaking behind it. Still don't know if it's worth it though

@tipsy
Copy link
Member

tipsy commented Mar 25, 2019

@pkkummermo
Copy link
Author

@tipsy
Copy link
Member

tipsy commented Mar 25, 2019

We could do something like dynamicCompression = setOf(BROTLI, GZIP) then.

@pkkummermo
Copy link
Author

Does it really make sense to not choose gzip if you're choosing brotli tho?

@tipsy
Copy link
Member

tipsy commented Mar 25, 2019

What about the brotli purists?

@pkkummermo
Copy link
Author

Haha I guess they're out there as well ^^

@nixxcode
Copy link
Contributor

Hi there,

is this still open for consideration? I feel implementing it is within my capabilities.

If yes, are there any requirements in addition to implementation/testing? (such as further research)

@pkkummermo
Copy link
Author

It should be quite straight forward, we only need to make sure it's easily configurable and backwards compatible :)

@tipsy
Copy link
Member

tipsy commented Jun 24, 2019

@nixxcode Yeah. I guess the implementation will have to be something like config.dynamicBrotli = true, then check for this boolean before writing the response. You'll have to add an optional dependency on some brotli library, unless it's a very small dependency (in which you can add it directly).

As @pkkummermo just mentioned, you'll have to maintain backwards compatibility, so gzip will be the default (and fallback).

@nixxcode
Copy link
Contributor

@tipsy Ok, great! I would like to implement this.

I understand Gzip is to be the default, but how should I handle a situation where both Brotli and Gzip booleans have been set true by the user? Do we warn the user and disable one (probably Brotli), or do you have a better suggestion?

Thank you.

@tipsy
Copy link
Member

tipsy commented Jun 24, 2019

You don't have to warn the user. Even if the Server allows both compression types, it also depends on what the Client accepts. When both are enabled on the Server:

  • If the Client accepts Brotli and Gzip, use Brotli
  • If the Client accepts Brotli, use Brotli
  • If the Client accepts Gzip, use Gzip
  • If the Client accepts neither, use neither

@nixxcode
Copy link
Contributor

Okay, thank you.

That's all I need to get started :)

@nixxcode
Copy link
Contributor

nixxcode commented Jul 4, 2019

Okay, just wanted to chime in to let you know I'm still working on this! I've just run into more problems than I expected. (Like always :D)

JBrotli was a no go, since development has pretty much stopped and it's using a really old (and buggy) version of Brotli (0.5.0 as opposed to the current 1.0.7)

I got dynamic compression working through https://github.com/dominikhlbg/BrotliHaxe , but the Java code is auto-generated and is very inefficient.

I am getting response times that average ~20ms for the 10000 SillyObject test via Google Chrome on Localhost, in comparison to Gzip's 2-3ms average. Although the response size is cut down to to 2232 bytes compared to Gzip's 7740.

Memory use on repeated calls is where I feel the big problem lies. On the same 10k SillyObject test, using Gzip and rapidly refreshing the page 50+ times, Javalin memory usage stays under 200mB. If doing the same with Brotli on identical test set, memory usage climbs to 600mB, which is quite hefty.

Right now it's impossible for me to tell how much of this is caused by Brotli versus the inefficient Haxe code, although I do highly suspect the latter is the problem seeing as it literally creates a new array object for every single byte written.

So, I would like to try and use the Haxe output as a pseudo-code base and write my own library to utilize Brotli. I can't find any other existing Brotli libs, so it seems like the best way to go?

Also, could I have an upper limit as to how much memory usage and response time is still considered within tolerable limits? Just as a guideline for optimizing the code.

Thanks :)

EDIT: In case you want to follow my commits, here is my fork of Javalin: https://github.com/nixxcode/javalin/tree/nixx-javalin-brotli-520

@pkkummermo
Copy link
Author

@nixxcode Which level do you do the compression on? 11 is super slow, but gives amazing gains in terms of compression. I think the paper linked in #520 (comment) found out that 9 is the best general level as it balances compression and perf, but I believe you'd need to experiment a bit on what works best :)

@nixxcode
Copy link
Contributor

nixxcode commented Jul 4, 2019

@pkkummermo I was running level 4 as per suggestion by the following article: https://blogs.akamai.com/2016/02/understanding-brotlis-potential.html

Playing around a little more has yielded some odd results. I tried level 9 as you suggested, but the response size actually increased from 2234 to 2321 bytes? A higher compression level should always have a lower output size if the compressor is working correctly.

This is very strange, and happens with both the String and ByteArray methods of the BrotliHaxe compressor. Not sure if this means problems with the implementation or something else. Would need more testing to be certain.

That said, level 9 spiked the memory usage of the Javalin process to an outlandish 1.5gb while keeping average response times roughly the same as level 4.

Level 0 behaved as expected, yielding a larger output size, but not by much (2697 bytes compared to 2234 at level 4), but kept memory usage the same as level 4 (around 600mB max) and cut average response time from 20ms to 10ms.

These are some strange, strange results, specifically level 9 yielding a larger compressed result than level 4.

@nixxcode
Copy link
Contributor

nixxcode commented Jul 4, 2019

It's also important to note for the above results, that Google Chrome does correctly recognize the content as Brotli and decompresses it without errors.

That means the compression result yielded by the library is correct, yet the outlandish memory usage and result size anomalies suggest that I really should try implementing my own version of a Java based Brotli lib.

@tipsy
Copy link
Member

tipsy commented Jul 4, 2019

Very nice work @nixxcode ! The results you're reporting are far worse than I had hoped though. It would be a cool feature to have, but I guess there's a reason why Jetty is holding off on implementing it.

The outlandish memory usage and result size anomalies suggest that I really should try implementing my own version of a Java based Brotli lib.

I have no idea how much work that would be, but if you're up for the task it would be great. I guess you'd publish it a separate dependency and have Javalin pull it in?

@nixxcode
Copy link
Contributor

nixxcode commented Jul 4, 2019

I have no idea how much work that would be, but if you're up for the task it would be great. I guess you'd publish it a separate dependency and have Javalin pull it in?

@tipsy Yes, exactly. If I can make it work and perform decently, it would make sense to publish it and make it available to others who may need it as well. :)

Honestly, I have no idea if I am in over my head, but there's only one way to find out! I will give it a go.

@tipsy
Copy link
Member

tipsy commented Jul 4, 2019

Honestly, I have no idea if I am in over my head, but there's only one way to find out! I will give it a go.

Great! I'm not sure I can help much with the implementation, but feel free to use this issue for moral support 😄

@nixxcode
Copy link
Contributor

nixxcode commented Jul 5, 2019

@tipsy @pkkummermo After scrapping BrotliHaxe due to the horrendous memory issues, I went back to JBrotli, but this time I forked it and started playing around in the source, rather than just including it in Javalin and following usage examples. I made some interesting discoveries.

The original bug I had problems with, that was crashing the JVM? It's only an issue when using the BrotliStreamCompressor object from JBrotli. I've found that by converting our resultStream to a byte array and feeding that to the regular BrotliCompressor (non-stream version) object in a single chunk, everything works perfectly fine!

I am pretty confident this has fully resolved the problem, since I found the cause of the JVM crash here: google/brotli#346
It is caused by a bug in the old version of Brotli (0.5.0), where a faulty boolean flag makes the Brotli data writer believe that it has reached the end of the data stream after being called twice. Since the non-stream version of JBrotli's object compresses the entire byte array in a single pass rather than using a stream, it gets around the problem entirely.

So, I now have a functioning Brotli compressor built into Javalin, here: https://github.com/nixxcode/javalin/

Performance wise, it matches up to what I read in this article: https://blogs.akamai.com/2016/02/understanding-brotlis-potential.html

I have been testing with level 4 compression, result size on the 10k SillyObject test is 2235 bytes as opposed to Gzip's 7740, so roughly a 9.2% improvement when compared against the uncompressed length of 59679 bytes.

Response times and memory usage are identical to those of Gzip, so performance is no longer an issue.

There are still some considerations, however:

  1. JBrotli's compression is not a pure Java implementation. It uses JNI bindings to execute the compression code via native libraries and is thus platform dependent. That means it may not work on every possible platform, though I don't think this is a huge issue, since we can still offer Gzip as a fallback.
  2. While being functional and having excellent performance, JBrotli is still using a Brotli library version that is 3 years out of date. Since everything seems to work great, I don't necessarily think this is a problem, but I still wanted to mention it.

If you want to have a play with it, the branch I linked to earlier should build and run without any errors when cloned. I have created a test class named TestServer.kt with a runnable main() function, which keeps the Javalin server running until manually stopped. I use this to play around and test via web browser requests. Additionally, all the original/updated tests within TestCompression.kt are passing.

If you are happy to move forward with this, I will open a pull request. Alternatively, please let me know what else needs to be done. :)

Thanks!

@pkkummermo
Copy link
Author

pkkummermo commented Jul 5, 2019

Having an initial brotli implementation (even though it's reliant on JNI bindings and is using a 3yo library) is a huge win as I see it, especially as it sounds it's stable "enough". It can be iterated upon and improved over time, and having it in the API of Javalin is a great start :) We could mark it as experimental in the first upcoming minor to ensure that people understand it's a WIP.

Only input is that we should perhaps allow for setting the brotli compression level as a part of the API, with sane defaults (4)?

Edit: Great work btw!

@nixxcode
Copy link
Contributor

nixxcode commented Jul 5, 2019

@pkkummermo Absolutely. What I linked to is still a testing playground. Before the actual PR happens, I will do more tests to figure out sensible compression levels (in terms of time/memory usage), so it can be documented.

And yes, I am happy to change the wrapper so it allows adjustment of compression level, if desired! The only reason I haven't already done that, was to keep it in line with Jetty's Gzip (which also doesn't allow for level changes without a workaround)

Edit: I still want to look into updating the available Brotli offerings for Java. The only difference is that now, I want to try and figure out how to update JBrotli, rather than implementing everything from scratch. Despite being out of date, JBrotli has a very solid code base and excellent performance, so contributing there makes more sense than building my own.

@pkkummermo
Copy link
Author

@nixxcode Great stuff! We could look into having a level setting exposed for gzip as well (non-breaking of course), with sane defaults. Right now I'm guessing it's at 6 as it is Jettys default.

I still want to look into updating the available Brotli offerings for Java

Sounds like a plan! Poke us if you want any input and again thanks for contributing :)

@pkkummermo
Copy link
Author

@tipsy just poked me and said Javalin is using the JDK implementation, but I see we can expose a level by extending GZIPOutputStream ourself. Seems straightforward enough:

https://stackoverflow.com/questions/19138179/gzipoutputstream-increase-compression-level

@tipsy
Copy link
Member

tipsy commented Jul 6, 2019

Very nice work @nixxcode !

JBrotli's compression is not a pure Java implementation. It uses JNI bindings to execute the compression code via native libraries and is thus platform dependent. That means it may not work on every possible platform, though I don't think this is a huge issue, since we can still offer Gzip as a fallback.

I don't see that as an issue either.

While being functional and having excellent performance, JBrotli is still using a Brotli library version that is 3 years out of date. Since everything seems to work great, I don't necessarily think this is a problem, but I still wanted to mention it.

Is there a newer version? I'm looking at https://github.com/MeteoGroup/jbrotli, and it seems to be dependency free. Also, how big is this dependency? I'm a bit tempted to pull it in as a non-optional dependency, as I think there's virtually 0 chance of anyone having conflicts with it.

Before the actual PR happens, I will do more tests to figure out sensible compression levels (in terms of time/memory usage), so it can be documented.

I think what you're doing here is really interesting. Do you have a blog where you could do a write up of the process/results? If not, I'd be happy to host an article on javalin.io (if you want to write one)

And yes, I am happy to change the wrapper so it allows adjustment of compression level, if desired! The only reason I haven't already done that, was to keep it in line with Jetty's Gzip (which also doesn't allow for level changes without a workaround)

As @pkkummermo mentioned already, Javalin doesn't use Jetty for GZIP. The core HTTP part of Javalin (the JavalinServlet) can be run on other servers by calling Javalin#createStandalone and calling app.servlet().service(req, res), so I thought GZIP should be separate.

I guess we'll need something like a CompressionStrategy now, which can take a GzipConfig(compressionLevel) and/or a BrotliConfig(compressionLevel).

Javalin.create { config -> 
    config.dynamicCompressionStrategy = CompressionStrategy(Brotli(lvl), Gzip(lvl)) // default is (null, Gzip(lvl))
    config.disableDynamicGzip // CompressionStrategy(null, null)

I also suspect we should refactor GZIP and Brotli out of JavalinServlet, and into it's own class or object.

As @pkkummermo mentioned earlier, we should probably mark this as experimental, since there are a lot of unknowns.

If you are happy to move forward with this, I will open a pull request. Alternatively, please let me know what else needs to be done. :)

Please do! Thanks again for all the work you're putting into this :)

@nixxcode
Copy link
Contributor

nixxcode commented Jul 6, 2019

@tipsy

Is there a newer version? I'm looking at https://github.com/MeteoGroup/jbrotli, and it seems to be dependency free.

Yes, the only "dependency" i am able to see is the very first folder in the repository is named "brotli @ 66c1451", From what I can tell, it's a direct import/copy of the source at https://github.com/google/brotli/tree/66c14517cf8afcc1a1649a7833ac789366eb0b51, which is an old branch of Brotli. But I imagine you are probably referring to dependencies in the form of libraries auto-included via plugins such as maven or gradle? If so, then you are correct, it has none of those, so there should be zero chance of conflict :)

JBrotli itself does use maven to download a native Brotli library depending on the platform the project is running on, but these native libs are pre-compiled and simply fetched from a maven repo as needed. They do not need to be built locally. JBrotli itself handles everything in the background once included via maven.

Also, how big is this dependency?

On Windows 64bit, the native Brotli lib it pulled is 321 KB, the JBrotli lib itself is 27 KB, so about 350 KB total. (I imagine size variance for other native libs will be negligible.)

I think what you're doing here is really interesting. Do you have a blog where you could do a write up of the process/results? If not, I'd be happy to host an article on javalin.io (if you want to write one)

Sure, I would love to. I do need to come up with a better testing strategy to get useful results out of it, though. So far, I have only been running the same "SillyObject" test over and over to eliminate memory/speed issues. Will need much more variety and ideally, real web content in order to get some useful comparison (and troubleshooting!) results.

Do you happen to know of any interesting web test suites that do a good job mimicking real-world HTTP data, rather than just generating random junk?

If not, would it make sense to just set up Javalin as a "middleman", having it request live web pages and forwarding the results to client? It would take some interesting workarounds, such as decoding existing compression, then re-encoding it ourselves and forwarding to client, but it should provide organic results as long as these workarounds are excluded from the benchmarks. Really, I am not sure if this is reaching too far, it's just an idea to get results from live web data. I am completely open to alternative suggestions!

I guess we'll need something like a CompressionStrategy now, which can take a GzipConfig(compressionLevel) and/or a BrotliConfig(compressionLevel).

I also suspect we should refactor GZIP and Brotli out of JavalinServlet, and into it's own class or object.

Agreed on both counts. I will get this done!

As @pkkummermo mentioned earlier, we should probably mark this as experimental, since there are a lot of unknowns.

Please do exactly that. Outside of the first contribution tutorial repo, this is literally my first Github contribution. Furthermore, my experience with version control and maintaining backwards compatibility is very limited. Because of this, it makes me a lot more comfortable having my code marked experimental, even if the situation hadn't strictly called for it.

Also, thank you for the kind comments and encouragement. It really means a lot to an open source newbie like me. :)

EDIT: Sorry, forgot you asked if there was a newer version. Did you mean of the original Brotli lib, or JBrotli? There are much newer versions of Brotli,

But for JBrotli, 0.5.0 (the version I am working with) is the latest JBrotli version that's available via Maven.

@nixxcode
Copy link
Contributor

nixxcode commented Jul 6, 2019

Also, in regards to the pull request. Should I do this right now, with my code as it is?

I understand they are meant to be used for discussion, revisions and collaboration before code merging actually happens, but I am completely new to Github, so I have no idea how to use them "properly". (specifically, what is easiest for you as the project owner)

So far, I have just been committing to my own fork of Javalin and writing commit comments, but I am not sure if this is the appropriate workflow.

I would be very grateful for some guidance here! :D

@tipsy
Copy link
Member

tipsy commented Jul 6, 2019

Also, in regards to the pull request. Should I do this right now, with my code as it is?
I understand they are meant to be used for discussion, revisions and collaboration before code merging actually happens, but I am completely new to Github, so I have no idea how to use them "properly". (specifically, what is easiest for you as the project owner)
So far, I have just been committing to my own fork of Javalin and writing commit comments, but I am not sure if this is the appropriate workflow.
I would be very grateful for some guidance here! :D

The earlier the better. A PR is a lot more visible than keeping your own fork, so it gives other members of the community a chance to get involved in the change. For me as a project owner it's easier too, since I can easily look through the changes against master.

Please do exactly that. Outside of the first contribution tutorial repo, this is literally my first Github contribution. Furthermore, my experience with version control and maintaining backwards compatibility is very limited. Because of this, it makes me a lot more comfortable having my code marked experimental, even if the situation hadn't strictly called for it.

You can add a normal deprecation: @Deprecated("Experimental API - May change soon") (or similar). Then we remove the deprecation when the API is stable.

Do you happen to know of any interesting web test suites th ... I am completely open to alternative suggestions!

Maybe you could manually download a few websites and API responses and save them as files, then read them and serve them from memory?

val smallWebsite = readFile("/smallWebsite.html")
val mediumWebsite = readFile("/mediumWebsite.html")
val largeWebsite = readFile("/largeWebsite.html")
val smallJsonResponse = readFile("/smallJsonResponse.json")
val mediumJsonResponse = readFile("/mediumJsonResponse.json")
val largeJsonResponse = readFile("/largeJsonResponse.json")

I guess more is better, but be considerate of your own time. JSON is more interesting than HTML.

On Windows 64bit, the native Brotli lib it pulled is 321 KB, the JBrotli lib itself is 27 KB, so about 350 KB total. (I imagine size variance for other native libs will be negligible.)

That sound acceptable as a direct dependency. Any opinion @pkkummermo ?

EDIT: Sorry, forgot you asked if there was a newer version. Did you mean of the original Brotli lib, or JBrotli? There are much newer versions of Brotli,

I meant Brotl itselfi. Would be interesting to see how much better the current version is.

Also, thank you for the kind comments and encouragement. It really means a lot to an open source newbie like me. :)

You're welcome. I've been running this and few other projects for years, and it's rare to get contributions of this quality.

@pkkummermo
Copy link
Author

350 kb is negligible in terms of "bundle size", so I'd say go ahead on a direct dependency :) As @tipsy said; the earlier the PR the better due to it being visible to people looking at the repo and them being able to give input.

@nixxcode nixxcode mentioned this issue Jul 7, 2019
15 tasks
@nixxcode
Copy link
Contributor

nixxcode commented Jul 7, 2019

PR has been opened! I apologize for the commit flood. Most of the early ones are part of my early experiments with the libraries and can be ignored, as I've already reversed them throughout later commits.

Thank you for the feedback, I will open a PR much sooner next time. My thinking was that I didn't want to bother anyone with PRs until I had a decent code contribution going, but I see now why that line of thinking was flawed. :)

You can add a normal deprecation: @deprecated("Experimental API - May change soon") (or similar). Then we remove the deprecation when the API is stable.

Good idea! I will do this once I have all the desired API changes implemented.

Maybe you could manually download a few websites and API responses and save them as files, then read them and serve them from memory?

Sure, I will give that a try.

I meant Brotl itselfi. Would be interesting to see how much better the current version is.

I plan on trying to make JBrotli use the newest Brotli version, but this will involve updating the c++ source code for the native libraries.

I am not saying no! I want to do it, but it's something that may take me a bit of time to figure out :)

You're welcome. I've been running this and few other projects for years, and it's rare to get contributions of this quality.

I am happy to know you think highly of my contributions! I feel it is only fair I give my best once I decide to get on board with an interesting project.

I am also thankful for the opportunity to contribute. Javalin ticks a lot of boxes in terms of skills I wish to develop. 👍

@tipsy
Copy link
Member

tipsy commented Jul 13, 2019

I'll close this one now, and open a new one for static files.

@tipsy tipsy closed this as completed Jul 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants