Skip to content

Commit

Permalink
Remove use of __builtin_expect when building with GCC
Browse files Browse the repository at this point in the history
__builtin_expect is something that should only be used if you're very sure it will result in
an optimization (see http://blog.man7.org/2012/10/how-much-do-builtinexpect-likely-and.html).

This code was added back in 2008. It seems it was part of an effort to get the preprocessor
functioning faster. It looks like mordante played some role in the decision of where to use
the LIKELY and UNLIKELY macros, so it's possible it had some use at the time (he's a rather
experienced programmer).

However, it's better to let the compiler automatically optimize code itself. I haven't done
any profiling on the preprocessor speed with or without __builtin_expect, but it doesn't seem
worth keeping this around and having to test every so often whether it's still useful or has
become a performance hit instead.

It's also worth noting that I myself noticed what seems a perceptible performance improvement
in the game (again, not backed by profiling) when I switched from TDM GCC to MSVC 2017. IIRC,
I'm using at least quick LTO for my builds. Apples and oranges, yes, but it also proves there
are likely various compiler options (such as LTO) which could improve performance on their own,
and better, than trying to point the program down branch paths ourselves.
  • Loading branch information
Vultraz committed Jan 21, 2018
1 parent ba5ddc4 commit eb52503
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 17 deletions.
8 changes: 4 additions & 4 deletions src/buffered_istream.hpp
Expand Up @@ -62,7 +62,7 @@ class buffered_istream
{
fill_buffer();

if(UNLIKELY(eof_)) {
if(eof_) {
return EOF;
} else {
/*
Expand All @@ -89,7 +89,7 @@ class buffered_istream
{
fill_buffer();

if(UNLIKELY(eof_)) {
if(eof_) {
return EOF;
} else {
/* See get() */
Expand Down Expand Up @@ -155,14 +155,14 @@ class buffered_istream
*/
void fill_buffer()
{
if(UNLIKELY(buffer_offset_ >= buffer_size_)) {
if(buffer_offset_ >= buffer_size_) {
/*
* This does not only test for the EOF, but also makes sure the
* data is available in the buffer. Without it readsome will read
* nothing, after its first call, even if the EOF has not been
* reached.
*/
if(UNLIKELY(stream_.rdbuf()->sgetc() == EOF)) {
if(stream_.rdbuf()->sgetc() == EOF) {
eof_ = true;
} else {
buffer_offset_ = 0;
Expand Down
10 changes: 5 additions & 5 deletions src/serialization/tokenizer.hpp
Expand Up @@ -95,7 +95,7 @@ class tokenizer

void next_char()
{
if (UNLIKELY(current_ == '\n'))
if (current_ == '\n')
++lineno_;
next_char_fast();
}
Expand All @@ -104,15 +104,15 @@ class tokenizer
{
do {
current_ = in_.get();
} while (UNLIKELY(current_ == '\r'));
} while (current_ == '\r');
#if 0
/// @todo disabled untill campaign server is fixed
if(LIKELY(in_.good())) {
if(in_.good()) {
current_ = in_.get();
if (UNLIKELY(current_ == '\r'))
if (current_ == '\r')
{
// we assume that there is only one '\r'
if(LIKELY(in_.good())) {
if(in_.good()) {
current_ = in_.get();
} else {
current_ = EOF;
Expand Down
8 changes: 0 additions & 8 deletions src/utils/general.hpp
Expand Up @@ -18,14 +18,6 @@
#include <algorithm>
#include <cctype>

#ifdef __GNUC__
#define LIKELY(a) __builtin_expect((a),1) // Tells GCC to optimize code so that if is likely to happen
#define UNLIKELY(a) __builtin_expect((a),0) // Tells GCC to optimize code so that if is unlikely to happen
#else
#define LIKELY(a) a
#define UNLIKELY(a) a
#endif

inline bool chars_equal_insensitive(char a, char b) { return tolower(a) == tolower(b); }
inline bool chars_less_insensitive(char a, char b) { return tolower(a) < tolower(b); }

Expand Down

8 comments on commit eb52503

@soliton-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reasoning here is that an optimization an experienced developer added is not useful because an article says such an optimization needs to be done carefully and one would have to confirm it‘s actually an improvement?

Considering that this is actual performance critical code I strongly suggest at least taking the time and doing some measurements before removing optimisations.

The optimization here is done based on knowledge about the input the code will process, which has not changed and I don‘t see ever changing. No compiler can possibly guess that. So if you want to suggest we use profile guided optimization that might be a good idea. Simply removing an optimization without any measurement or replacement seems like a bad idea though.

@Vultraz
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason is I think we should focus on global optimizations not compiler-specific ones, unless of course the compiler-specific optimizations are done by the compilers themselves (ie, via flags). Even if this is a valid optimization, it doesn't exist at all for Clang or MSVC (though yes, I do believe most release builds are compiled with GCC or equivalent like TDM GCC).

If we want to use guided optimization, we should look into how we can make it work on all three compilers, just how much performance we can gain, and it should be well-documented.

@CelticMinstrel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be so sure about it not existing for Clang, as that's mostly compatible with GCC, and I generally agree that removing optimizations without profiling is kinda silly.

@GregoryLundberg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, that blog post isn't some random comment. It's considered pretty much the authoritative look at the feature by someone who knows the Linux kernel well.

Second, the feature was added to gcc at the behest of the Linux kernel team, and is used widely in the kernel. But we're not writing kernel code here.

If you look at the results in the blog post, he rightly points out that the compiler generally does a better job optimizing than a programmer. And that was for 4.6, I'm sure 7.2 is much better, yet. Look at what happened when he let the compiler have at it, without the forced optimization choice, it was able to better optimize the surrounding code and get better over-all results.

Remember that he's only talking about changes of, maybe a fraction of a second. Net. Over 1 billion tests. So, his profiling showed that, maybe, there's a change of 1 nanosecond per iteration through the effected code.


Ignoring noise, there are, basically two commits in question.

The buffered_istream commit came some time after the tokenizer commit.

The changes to the tokenizer were by a university student. His GitHub page shows virtually NO activity since created in 2008, when he made this commit (before Wesnoth moved, so his work does not show). Googling his university account only finds his work on Wesnoth. I did find him on Stack Overflow where he's recently become active and has answered a few questions. I believe I have also located his Linkedin page and it seems he's not professionally engaged in programming.

The changes to the tokenizer seem more likely to fall into the class where this sort of forced optimization is actually going to work against the compiler's superior abilities. Profiling would prove it. But in the absence of profiling, it really does not matter whether the code has the forced optimization or not. If he had asked before-hand, the lack of profiling should been sufficient to suggest Vultraz not make the change. But he has, and without profiling, there is no reason to back it out.

The addition of the buffer_istream, however, does include profiling information in the commit message which indicates the addition as a whole was a net gain in performance sufficient to be humanly perceptible during the startup phase.

That said, the profiling was for the entire file, not on the forced optimization. My read is that, here, the optimization is likely to produce measurable improvement. But I'm not convinced it would be perceptible. The same arguments hold against Vultraz having removed it, and against putting it back in without specific profiling.

My thinking on buffered_istream, however, is that it could be further optimized by being eliminated. As written, it takes a buffered file, enclosed in an istream, and encloses that, yet again, adding another buffer, in the new class. It seems it would be more appropriate, and likely faster than any forced optimization, to eliminate this new class, and eliminate the istream, wrapping a std::filebug in a class which supports the get/peek/unget (etc) functions needed by the tokenizer.

Furthermore, on code reading, the buffered_istream needs to take special steps to work around the fact that istream::readsome() is implementation-defined. We really should avoid implementation-defined features.

My recommendation, then, is to leave the changes as they are until and unless profiling can show a meaningful improvement from using _builtin_expect.

Finally, I would recommend someone taking a hard look at re-implementing buffered_istream to eliminate the intermediate istream and the speed and memory hit from double buffering, and to eliminate the use of implementation-defined behavior.

@jyrkive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I agree with Vultraz's decision to remove __builtin_expect(). The performance improvement per check is known to be negligible, attempting to outsmart the compiler is rarely a good idea, and there are better ways to improve performance such as LTO.

@GregoryLundberg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. He and I talked about it before he did it. I wanted it profiled and suggested you because it seemed something you'd want to look at.

But, as I said, done's done and it should not be reverted unless it can be shown to be worthwhile. (My guess is it won't or if it is it won't be very.)

@jyrkive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't have WLAN working on GNU/Linux and I'm therefore not really able to profile the impact of __builtin_expect().

@soliton-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not questioning the article. I happen to fully agree with it. I don‘t agree with the conclusions though. As mentioned the tokenizer is likely a good place for these optimizations, so if someone took the time to implement and verify these I wouldn‘t just remove them without evidence.

This is a codepath that likely hasn‘t changed much if at all since the optimization was done and the input it‘s based on hasn‘t either.

I‘m questioning armchair profiling based commits in performance critical code where every percent helps. Especially if you‘ve got many add-ons installed.

Also, yes, this worked on clang as well.

Finally, I agree profiling needs to be done no matter what we decide. And sure there might well be other even better optimizations we can do which is fairly independent of removing existings ones though.

Please sign in to comment.