Skip to content
This repository has been archived by the owner on Sep 9, 2022. It is now read-only.

Conditionally avoid Chrome-specific micro-optimisations on non-Chrome builds #702

Closed
AlexVallat opened this issue Feb 5, 2015 · 17 comments

Comments

@AlexVallat
Copy link
Contributor

I noticed that there are a few places where micro-optimisations (micro in scope, not necessarily effect!) are used based on Chrome benchmarks in jsperf. In some cases, though, the same benchmarks under Firefox can give very different results.

For example, there is an optimisation to avoid RegExp matching and use indexOf when matching single-wildcard patterns. This is based on http://jsperf.com/regexp-vs-indexof-abp-miss/3 which shows that indexOf is slightly faster for Chrome. However, under Firefox the same benchmark shows RegExp to be over twice as fast.

Similarly there's an optimisation to avoid the use of Object[] by using index of, based on http://jsperf.com/string-indexof-vs-object which again has the opposite result under Firefox, with Object[] being a lot faster.

I'm not sure if this is because Chrome has a very fast indexOf function, or Firefox has a fast RegExp and Object[] function, but in the end, it doesn't really matter. Either way, it would be nice, if possible, if the different performance optimisations could be done conditionally so that the different builds get the appropriate optimisations.

@chrisaljoudi
Copy link
Contributor

So something like build-time conditionals (à la C-style preprocessor #ifdef)?

I think that'd be a bit too painful to maintain from @gorhill's point of view. Given we have a Safari port, that would mean having specific optimizations for that, too — which requires OS X for testing.

@chrisaljoudi
Copy link
Contributor

To explain, I'd be happy to do benchmarking for Safari — I've been doing so with the platform code for Safari. If this were to apply to core, there would need to be a very specific workflow to have various tests for various platforms.

@AlexVallat
Copy link
Contributor Author

I was thinking that you would have a basic 'most obvious solution' in the core code, but browser-specific overrides where benchmarking showed that there are gains to be had for a specific browser by doing something non-obvious. That way anyone interested in tweaking to get better performance would only have to test against the one browser they were working with without fear that it would break or degrade performance in others.

If no-one cares to optimise for Safari under OS-X then it just remains at the base level.

I agree that it would be more work to set up such a system to start with, but once set up I'd suggest that it would actually be less maintenance work than trying to maintain a single set of optimisations across all browsers.

Of course, you could always go with the argument that this is a Chrome extension and will be optimised for Chrome, and all other ports are secondary and will be fixed only so far as to be functional. Certainly a valid position, but one which would make me a sad bunny.

@chrisaljoudi
Copy link
Contributor

@AlexVallat

If no-one cares to optimise for Safari under OS-X

Not at all! I'm trying to pay as much attention and put as much effort into making µBlock for Safari as efficient and effective as it can be at this point. If it ends up being that there exists a practice of optimizing core stuff for each browser, I'm on it.

And your idea is a very appealing one, but I'm not sure how you would go about implementing that efficiently.

The obvious way would be to have all of these types of things in functions that the platform code can override. Say,

vAPI.utils = {
    indexOf: function(a, b) {
        return a.indexOf(b);
    }
};
// ...
// Use it:
var indexOfMatch = vAPI.utils.indexOf(s, pattern);

And then platform-specific code can override that, say for Firefox:

// In Firefox-specific JS
vAPI.utils.indexOf = function(a, b) {
    return a.MozMagicalSuperFastIndexOf(b);
};

The problem with that is, adding the extra "layer" (the extra function call) is actually quite costly, and arguably might cancel out any benefits you'd get from using the more optimized method.

The only way this would make sense, I think, is if it's done at compile-time with some kind of pre-processing. Again, the details of how you'd implement that aren't clear to me.

@AlexVallat
Copy link
Contributor Author

I agree, I don't think that simply overriding functions is sufficient. There would have to be different code done either by platform specific files, or platform specific regions within files. Neither of which would be a whole lot fun in Javascript.

Not impossible: after all, you are running python scripts as part of a build process already, but non-trivial.

I don't have a ready-to-go system to suggest that you implement at this point, merely the idea that it would be something I think worth investigating. Unless you are against any such system in principle.

@gorhill
Copy link
Contributor

gorhill commented Feb 5, 2015

@AlexVallat

you could always go with the argument that this is a Chrome extension and will be optimised for Chrome. Certainly a valid position, but one which would make me a sad bunny.

What makes me a sad bunny is to see you would think I could make such a terrible argument.

I need to think this through more. Like @chrisaljoudi says, there is a cost to an extra layer of indirection. There are many things to consider, including the fact that regular expressions cost more memory than plain strings.

My usual thinking is to to find the optimal compromise between avoiding platform-specific code, extra layers of indirection (they punish every platform in the end), and code complication. In the current case, my thinking is to use the best performing approach for the worst performing platform. There is so many things to consider, including whether the benchmark reflects reality. In the end what swayed me to not using regex is their memory cost, it's much higher than a plain string.

@AlexVallat
Copy link
Contributor Author

I'm glad to hear that, it sounds like a sensible approach to me. I'm not sure how you go about benchmarking the difference in memory usage at that granular a level; I haven't found anything as handy as jsperf for that! I hadn't realised that was the case for the RegEx vs. indexOf approach when I was reading through, and assumed it was a decision based solely on the benchmark linked in the comment.

I appreciate that you are developing this plugin with such an emphasis on optimisation, particularly once in lands on Firefox mobile! So thank you.

@gorhill
Copy link
Contributor

gorhill commented Feb 5, 2015

The jsperf benchmark was probably flawed due to the fact the inner block was empty. I've learned to use jsperf better since then. Here, using a non-empty inner block:

http://jsperf.com/regexp-vs-indexof-abp-miss/5

Winner: substr on both platforms.

Never mind: result is same without an empty block.

@gorhill
Copy link
Contributor

gorhill commented Feb 5, 2015

It would be nice to take advantage of what's best for Firefox, now is to figure a way to do it in a way that doesn't render the code overcomplicated, and doesn't cause an overhead for the slower platform.

@AlexVallat
Copy link
Contributor Author

I was about to say, something fishy about substr miss result there!

Using your version 5 instead, I don't notice any significant difference in results. I thought I'd give Chrome a try too, so, for a giant sample size of one (my dev machine), the results I get are:

test Firefox 36 Chrome 40
RegEx 8.6M 3.8M
indexOf 4M 2.8M
substr 8M 3.8M

Firefox rocks. (Sorry, personal bias creeping in there)

@gorhill
Copy link
Contributor

gorhill commented Feb 5, 2015

One wildcard hit

Chromium:

  • regex: 1,180,877
  • indexOf: 1,271,023
  • substr: 1,286,460

Firefox:

  • regex: 1,519,168
  • indexOf: 2,120,623
  • substr: 1,653,910

One wildcard miss

Chromium:

  • regex: 2,678,663
  • indexOf: 2,707,108
  • substr: 2,851,667

Firefox:

  • regex: 5,082,442
  • indexOf: 4,316,430
  • substr: 3,603,777

How many filters with one wildcard in EasyList + EasyPrivacy?

EasyList: 2,432
EasyPrivacy: 1,038

@AlexVallat
Copy link
Contributor Author

Out of interest, do you have any good statistics on frequency of hits vs. misses? My gut feel would be that misses far outweigh hits, but I haven't actually collected any statistics to test this theory.

@gorhill
Copy link
Contributor

gorhill commented Feb 5, 2015

@AlexVallat very difficult to tell, without measuring code. In general, any network request is likely not matching any filter -- most requests aren't blocked. But that's the bird view. In practice, there are many conditions to be met before a filter's match code is executed: any-1st-/3rd-party, any-/specific-request type, filter class, etc.

Given the number of such filters, given that such filter is likely not called that often given the weed out process before such filter is reached, and given that regex do require more memory, I am not sure it's worth adding complexity to the code to use regex for FF, I suspect the CPU gain would end up being marginal.

Profiling from the entry point of network processing would show whether it's worth. I do profile this on Chromium regularly when I wonder if I made things worst or not, and I also did for FF though I didn't keep and check in the profiling code. I say if there is a significant difference in the speed each network request is handled on average then it means it's worth looking further. One challenge is to find an objective benchmark scenario for FF. On Chromium I use my own sessbench to automate all this along with the "reference benchmark" (somewhere in the wiki).

@gorhill
Copy link
Contributor

gorhill commented Mar 4, 2015

I installed a counter for each internal class of filter.

Using default filter lists, I get 218 such wildcard-based filters which are handled by the special substr code.

So replacing with plain regex would probably bring nothing performance-wise on Firefox. On the other hand, using plain regex similarly won't bring a cost performance-wise on Chromium. But not using regex bring higher code complexity. Because of this, I will convert all those special one-wildcard filter classes into a generic one to be used as fallback for whatever filter for which it's not worth creating special optimization, which will reduce the number of internal filter classes, and thus less code to maintain.

But in retrospect, I went overboard to keep trying to optimize for filters which are quite less numerous, like the wildcard-based ones, especially given that on Firefox the optimization doesn't seem to be beneficial anyways.

@gorhill gorhill closed this as completed in 5f65b17 Mar 5, 2015
@TheRyuu
Copy link

TheRyuu commented Mar 7, 2015

I hate to comment on this after it's been closed but I got different results than what was posted above for the wildcard miss benchmark[1] and I'm not sure if it's relevant or not.

On Chrome 42.0.2311.22 (64-bit dev branch) substr was significantly faster than the other two.
RegExp: 1,168,394
indexOf: 2,755,091
substr: 4,032,620

Which would indicate it would bring a cost performance wise on Chromium although in practice I'm not sure it would be an issue, just thought I'd share it considering the difference (I'm not sure what the cause of this difference is, 64-bit Chrome on Windows perhaps?).

Performance on Firefox for RegExp and substr was almost the same (> 8,000,000 in both cases). Since the "special optimizations" were already written would it not make sense to just keep them despite the added complexity if the code has been tested already and known to work?

Edit: Firefox numbers on same machine.
RegExp: 8,685,279
indexOf: 3,968,924
substr: 8,317,695

[1] http://jsperf.com/regexp-vs-indexof-abp-miss/5

@gorhill
Copy link
Contributor

gorhill commented Mar 7, 2015

After changes (Chromium 40):

µBlock> onBeforeRequest: 0.129 ms (8082 samples)
µBlock> onBeforeRequest: 0.129 ms (8197 samples)
µBlock> onBeforeRequest: 0.129 ms (8287 samples)
µBlock> onBeforeRequest: 0.129 ms (8356 samples)
µBlock> onBeforeRequest: 0.129 ms (8431 samples)
µBlock> onBeforeRequest: 0.129 ms (8501 samples)
µBlock> onBeforeRequest: 0.129 ms (8548 samples)
µBlock> onBeforeRequest: 0.128 ms (8708 samples)
µBlock> onBeforeRequest: 0.128 ms (8722 samples)
µBlock> onBeforeRequest: 0.128 ms (8859 samples)

And these were results with 0.8.7.0:

µBlock> onBeforeRequest: 0.131 ms (8664 samples)
µBlock> onBeforeRequest: 0.131 ms (8763 samples)
µBlock> onBeforeRequest: 0.131 ms (8839 samples)
µBlock> onBeforeRequest: 0.130 ms (8914 samples)
µBlock> onBeforeRequest: 0.131 ms (8988 samples)
µBlock> onBeforeRequest: 0.131 ms (9033 samples)
µBlock> onBeforeRequest: 0.130 ms (9192 samples)
µBlock> onBeforeRequest: 0.130 ms (9206 samples)
µBlock> onBeforeRequest: 0.129 ms (9324 samples)
µBlock> onBeforeRequest: 0.129 ms (9329 samples)

So we are good.

@gorhill
Copy link
Contributor

gorhill commented Mar 7, 2015

@TheRyuu I am not too worried about the drop in performance for RegExp in Chrome 42. So far I will assume it's just a development version quirk, not uncommon with dev builds.

Edit: Hum actually I read again, and what your benchmark shows is not a performance decrease of RegExp, but rather a sizeable performance increase for String.substr. Given that for Chromium 40, RegExp and String.substr were not that different, the above measured timing will still stand with Chromium 42 -- they might get slightly better because of String.substr. So long as there is no severe drop in performance, we shouldn't worry.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants