Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

True colour to xterm256 conversion: significant speed up #432

Closed
wants to merge 1 commit into from

Conversation

avih
Copy link

@avih avih commented May 30, 2016

Speed:
In a hot loop, where the old code has its ~2K pre-calculated data
already cached, this new code is x40-x50 faster (gcc/clang) with
arbitrary RGB values, and ~x3 faster with RGB values which have an
exact xterm256 colour representation (e.g. #5f87ff).

When used sparsely, or otherwise when the old code doesn't have the
data cached, the new code is likely to be at least another order of
magnitude faster, as it does not use pre-calculated data (technically,
it uses less than 20 constants).

Further 50% speed-up is possible with 256 bytes of pre-calculated date,
or 100% (x2 speed up) with 1K of pre-calculated data.

Compatibility:
This code returns identical values as the old code for the same inputs.

However, the old code is implicitly rounding down from the middle. For
instance, for RGB of (13, 13, 13) which is exactly between the
potential return values 232 (8, 8, 8) and 233 (18, 18, 18), it returns
232. It does the same with colour values at 16..231 .

While there's no right or wrong in this case between rounding up/down,
the new code also supports the more conventional paradigm of rounding
up from the middle. To enable it, set LC (legacy compatibility) to 0.

Speed:
In a hot loop, where the old code has its ~2K pre-calculated data
already cached, this new code is x40-x50 faster (gcc/clang) with
arbitrary RGB values, and ~x3 faster with RGB values which have an
exact xterm256 colour representation (e.g. #5f87ff).

When used sparsely, or otherwise when the old code doesn't have the
data cached, the new code is likely to be at least another order of
magnitude faster, as it does not use pre-calculated data (technically,
it uses less than 20 constants).

Further 50% speed-up is possible with 256 bytes of pre-calculated date,
or 100% (x2 speed up) with 1K of pre-calculated data.

Compatibility:
This code returns identical values as the old code for the same inputs.

However, the old code is implicitly rounding down from the middle. For
instance, for RGB of (13, 13, 13) which is exactly between the
potential return values 232 (8, 8, 8) and 233 (18, 18, 18), it returns
232. It does the same with colour values at 16..231 .

While there's no right or wrong in this case between rounding up/down,
the new code also supports the more conventional paradigm of rounding
up from the middle. To enable it, set `LC` (legacy compatibility) to 0.
@nicm
Copy link
Member

nicm commented May 30, 2016

Thanks. I will look at this properly later but I can tell you now that you will need to put this under ISC licence and add your copyright line at the top of the file with the others, this extra copyright statement in the middle of the file is not OK.

@avih
Copy link
Author

avih commented May 30, 2016

Thanks. I'll update the license related stuff after you're happy with the patch itself.

I'm fine with any OSS license but prefer to keep the rights as the original author. If that's somehow implied anyway, then I can take out the license part completely. If it is implied, I'd appreciate if you could point me to the thing which implies/states it.

@nicm
Copy link
Member

nicm commented May 30, 2016

For copyright, just add your copyright underneath mine at the top of the
file like this:

That means you are asserting that the changes are under the same
license. But, yes, you can do that later.

I only have one question about how it works, and I may just be revealing
my ignorance of colours:

Why are your grey ranges weighted so much towards white, but the 6x6x6
cube is the other way with more of the range for darker colours?

As far as the code itself goes, this is a short piece of code but it is
doing a lot and it is not easy to follow:

  • You need some comments explaining what it is doing, either a big
    comment at the beginning or some comments in the code. They don't need
    to be huge but there are a lot of calculations and magic numbers in
    here, it needs some more comments.
  • There is no particular need to mention potential speed ups or stuff
    you are not implementing.
  • There is no need for the LC define. Pick the best approach and do it,
    nobody will ever want to change this. I don't care if it is exactly
    the same as it is now.
  • v2gi(v) (v > 238 ...) <-- this could be 232 not 238?
  • v2gi is only used once, so I don't think it needs to be a macro.
  • Likewise I would make dist_sq() a function rather than a macro, or just
    do it twice in place.
  • In fact, I would try to get away from macros and ?: altogether, or at
    least try to avoid using them so much.

In general, I think you are erring too much on the side of
terseness. The code would be better longer but easier to read.

Note that this function is not particularly performance critical for
most users. It would be nice if it was fast, of course, but better if it
is also easy to understand.

Cheers

On Mon, May 30, 2016 at 12:30:22PM -0700, avih wrote:

Thanks. I'll update the license related stuff after you're happy with the
patch itself.

I'm fine with any OSS license but prefer to keep the copyright as the
original author. If that's somehow implied anyway, then I can take out the
license part completely. If it is implied, I'd appreciate if you could
point me to the thing which implies it.

--
You are receiving this because you commented.
Reply to this email directly, [1]view it on GitHub, or [2]mute the thread.

Reverse link: [3]unknown

References

Visible links

  1. True colour to xterm256 conversion: significant speed up #432 (comment)
  2. https://github.com/notifications/unsubscribe/AASkc16kuDNby2ALjkC2f4le176KLiZcks5qGzrOgaJpZM4Ip-kd
  3. True colour to xterm256 conversion: significant speed up #432 (comment)

@avih
Copy link
Author

avih commented May 31, 2016

For copyright, just add your copyright underneath mine

Ah, nice. Will do.

Why are your grey ranges weighted so much towards white, but the 6x6x6 cube is the other way with more of the range for darker colours?

Not sure I understand the statement or the question. The code doesn't apply some external knowledge or preference, other than what's 100% dictated by the definition of the values of xterm256. As a semi proof - it returns exactly the same values as the old code does.

The colors which are represented are all the R/G/B combination where each channel is one of 6 possible values: 0x0, 0x5f (95), 0x87 (135), 0xaf (175), 0xd7 (215) and 0xff (255).

You'll notice that except for the first "jump" which is 95 (decimal), all the other "jumps" are 40 (decimal). If your original RGB is (48,48,48), your only choices are either 0,0,0 or 95, 95, 95. It just has much lower resolution for darker colors.

But the grays are more evenly spread out. They're (decimals) 8, 18, 28, 38, 48, ... 238 - increments of 10 for overall 24 values. But if we applied a naive transformation, then values of 243 and above would end up trying to "round" to an index of 24 or 25 - which doesn't exist (index 23 is 0xff as xterm256 color).

If your question was about the spec itself, then I can only guess. I don't think there's a difference in perception between grayscale colors and actual colors in this regard, but I'm guessing that after using 6x6x6 for the colors (because there's enough room for it in a byte, but not enough for 7x7x7), and being left with 24 values "to fill up" - which is relatively roomy, they decided that even spread is good enough.

With the color values, however, because each channel has only 6 possible values, I think they tried to make them more useful. I'm guessing r/g/b values below 100 are much less frequently used than values above 100 (because sRGB is not perceptually uniform), and so it ended up non linear at the bottom.

Please rephrase the question if this doesn't answer your concern.

  • v2gi(v) (v > 238 ...) <-- this could be 232 not 238?

It can be any value between 232 - 242 (inclusive). The formula will calculate the correct value (index 23) for 233, 234, ... 242, but will calculate an incorrect value for 243 and above. Hence the v > 238 part just needs to make sure the formula isn't applied to values of 243 or higher. For values of 232-242 it will return the same value as the formula. I chose 238 since it's the actual r/g/b value of index 23 so hopefully easier to relate it to the grayscale values.

  • v2gi is only used once, so I don't think it needs to be a macro.

This is the current code:

#       define v2gi(v) (v > 238 ? 23 : (v - 3) / 10)
        gray_index = v2gi((r + g + b - LC) / 3);

Since v appears twice at the formula, expanding the macro inline would result in a cumbersome expression, which is less readable IMO:

gray_index = (r + g + b - LC) / 3 > 238 ? 23 : ((r + g + b - LC) / 3 - 3) / 10;

And also, the patch separates the formula of "value to gray index" from the value on which it's applied, thus hopefully making it easier to grasp/follow.

  • Likewise I would make dist_sq() a function rather than a macro, or just do it twice in place.

By using a name, I was trying to give semantics (distance) to the formula, but sure, in place is fine too.

  • In fact, I would try to get away from macros and ?: altogether, or at least try to avoid using them so much.

My goal was to put each formula near the place it's used by the code, such that when you see it applied, you don't have to scroll up to find out how is the formula working. I.e. to hopefully make the context clearer. But function for each formula is fine too if you prefer so.

Note that this function is not particularly performance critical for most users. It would be nice if it was fast, of course, but better if it is also easy to understand.

In general, I don't think the approach which the PR takes can be presented in a simpler way than the original (your) code, since it basically takes the 2K of data which the original code dumps at the source file, and models it into formulas.

This representation is a non trivial one because the model itself is not perfectly symmetric - it's non linear for the color cube (though luckily most of it is linear), and has an exception with the grayscale values, but the code follows the only possible model (assuming, as many others, that measuring distances in sRGB is good enough).

I'm absolutely fine if you prefer to keep the original code over this PR, with no offense taken. If you think its readability outweights the value of the PR, sure, keep it.

Just let me know what you decide, and regardless, thanks for maintaining tmux :)

@avih
Copy link
Author

avih commented May 31, 2016

Is this the kind of thing you prefer?

/*
v2q(v): find the index q which represents a value nearest to v

Index (q)          0                    1         2         3         4         5
Represents(hex)   0x00                 0x5f      0x87      0xaf      0xd7      0xff
Represents(dec)    0                    95       135       175       215       255
Thresholds of v    |          48        |   115   |   155   |   195   |   235   |
Return value (q):  |     0    |       1      |    2    |    3    |    4    |  5 |

Number of values
nearest to q:      |<---48--->|<-----67----->|<--40--->|<--40--->|<--40--->|<21>|

So the return value (0/1/2/3/4/5) as a function of v (0..255) can be written,
using integer arithmetics, as:

#define v2q(v) (v < 48 ? 0 : v < 115 ? 1 : (v - 35) / 40)

Where we could replace 115 with any value between 75..115 (inclusive), but we
prefer the highest since it would allow us to get the correct retuen value
earlier without having to execute the actual calculation.
*/

Where at the actual code I used 114 instead of 115 to avoid taking LC into account at this part.

@nicm
Copy link
Member

nicm commented Jun 6, 2016

Sorry for the delay, for some reason I didn't get emails with your replies.

Yes, this answers my questions and you explain it very well. So let's say some of this in the code, and make it a bit more obvious.

How about this? colour.c.diff.txt

I tried to keep the macros, but once you add the extra ()s around each parameter they become quite unwieldy. I also applied tmux's code style a bit more consistently.

Note that I haven't tested this yet, so it may well be wrong. I'll do that later today if I have time.

@avih
Copy link
Author

avih commented Jun 6, 2016

Thanks. Looks good. I know you said you don't mind much (neither do I), but keep in mind that this will produce different colors than the code it replaces for r/g/b values which are exactly at the middle between the values represented by two near 'q's.

Nits:

colour_dist_sq(int R, int G, int B, int r, int g, int b)

This is not specific to color or RGB. It's just the formula for (squared) distance in any Euclidean (i.e. normal) space, which we consider sRGB part of (i.e. I'd not use color at the name, and XYZ would be more fitting than RGB as values, though it's not wrong either).

6x6x6 colour cube (16 - 237) and 24 greys (238 - 255)

16-231 and 232-255

Greys are more evenly spread (8, 18, 28 ... up to 238 - 238 and above are black)

"Above 238" is neither black nor white. The brightest gray value is 238,238,238 - index 23 within the grayscale section or 0xff as xterm256 value (brightest - at the gray part of the xterm256 colors, since at the 6x6x6 part we also have the brighter (255,255,255)).

Overall, looks good to me (haven't tested it either).

@nicm
Copy link
Member

nicm commented Jun 6, 2016

This is not specific to color or RGB.

Right, but in the colour.c file we use the colour_ prefix on functions, and it is only used for RGB here so it is more easily understood if the arguments are name r,g,b,R,G,B.

I have fixed the other points, thanks

@avih
Copy link
Author

avih commented Jun 6, 2016

in the colour.c file we use the colour_ prefix on functions

Gotcha.

@nicm
Copy link
Member

nicm commented Jun 6, 2016

OK my code produces the same output as yours with LC=0 so I will apply this. Thanks!

@nicm nicm closed this Jun 6, 2016
@avih
Copy link
Author

avih commented Jun 6, 2016

You're welcome.

@lock
Copy link

lock bot commented Feb 14, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Feb 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants