Fix readInt and hlint warning fixes #34

Merged
merged 9 commits into from Jan 24, 2012

Conversation

Projects
None yet
4 participants
@erikd
Contributor

erikd commented Jan 23, 2012

Hi all,

The existing readInt generated garbage on bad input. My version will produce something a little saner. The same patch also replaces an ad-hoc readInt implementation with a call to readInt.

The other patch is just hlint warning fixes. Feel free to throw that one away if you don't believe in hlint :-).

Cheers,
Erik

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jan 23, 2012

Owner

Did you benchmark before and after the commit? The functions you were touching are in the hotpath for Warp, and I don't believe readDec is nearly as fast as what we had. I wasn't too worried about producing garbage, since "garbage in, garbage out."

Owner

snoyberg commented Jan 23, 2012

Did you benchmark before and after the commit? The functions you were touching are in the hotpath for Warp, and I don't believe readDec is nearly as fast as what we had. I wasn't too worried about producing garbage, since "garbage in, garbage out."

@erikd

This comment has been minimized.

Show comment Hide comment
@erikd

erikd Jan 23, 2012

Contributor

How does one benchmark Warp? I haven't even managed to get "cabal test" to work, even without my commits.

Contributor

erikd commented Jan 23, 2012

How does one benchmark Warp? I haven't even managed to get "cabal test" to work, even without my commits.

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jan 23, 2012

Owner

How about just writing a Criterion benchmark comparing the current
integer parsing code versus readDec?

Owner

snoyberg commented Jan 23, 2012

How about just writing a Criterion benchmark comparing the current
integer parsing code versus readDec?

@erikd

This comment has been minimized.

Show comment Hide comment
@erikd

erikd Jan 23, 2012

Contributor

I've tested this with this code:

https://gist.github.com/1662654

I compile and run this as:

ghc -O3 --make readInt && ./readInt

and I get the following results:

benchmarking readIntOrig
mean: 443.9359 ns, lb 443.3310 ns, ub 444.7307 ns, ci 0.950
std dev: 3.540660 ns, lb 2.835138 ns, ub 5.021723 ns, ci 0.950

benchmarking readInt
mean: 362.8828 ns, lb 362.3685 ns, ub 363.5877 ns, ci 0.950
std dev: 3.049993 ns, lb 2.408989 ns, ub 4.574745 ns, ci 0.950

benchmarking readDec
mean: 9.099625 us, lb 9.085278 us, ub 9.116265 us, ci 0.950
std dev: 78.83395 ns, lb 66.47067 ns, ub 97.98427 ns, ci 0.950

Yes, readDec was slow, but my readInt is faster than the readIntOrig which is what Warp is currently using.

I'm also a little concerned is that the original readInt is wrong on 32 bit systems for strings containing values >= 2^31.

Shall I submit an updated pull request that uses my readInt?

Cheers,
Erik

Contributor

erikd commented Jan 23, 2012

I've tested this with this code:

https://gist.github.com/1662654

I compile and run this as:

ghc -O3 --make readInt && ./readInt

and I get the following results:

benchmarking readIntOrig
mean: 443.9359 ns, lb 443.3310 ns, ub 444.7307 ns, ci 0.950
std dev: 3.540660 ns, lb 2.835138 ns, ub 5.021723 ns, ci 0.950

benchmarking readInt
mean: 362.8828 ns, lb 362.3685 ns, ub 363.5877 ns, ci 0.950
std dev: 3.049993 ns, lb 2.408989 ns, ub 4.574745 ns, ci 0.950

benchmarking readDec
mean: 9.099625 us, lb 9.085278 us, ub 9.116265 us, ci 0.950
std dev: 78.83395 ns, lb 66.47067 ns, ub 97.98427 ns, ci 0.950

Yes, readDec was slow, but my readInt is faster than the readIntOrig which is what Warp is currently using.

I'm also a little concerned is that the original readInt is wrong on 32 bit systems for strings containing values >= 2^31.

Shall I submit an updated pull request that uses my readInt?

Cheers,
Erik

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jan 23, 2012

Owner

That looks perfect, I'd definitely pull in that code, thanks!

Owner

snoyberg commented Jan 23, 2012

That looks perfect, I'd definitely pull in that code, thanks!

erikd added some commits Jan 24, 2012

warp : Replace readInt function with something faster and more correct.
The old version of the function would produce garbage values for
strings containing non-digits. The new version will covert all
digits available into a number and drop trailing rubbish.

The new version is provably faster.
Benchmarking program : test/readInt.hs.
@vincenthz

This comment has been minimized.

Show comment Hide comment
@vincenthz

vincenthz Jan 24, 2012

Contributor

@erikd: it would probably go even faster to use a lookup table just like the one i put in base16-bytestring (or i've talked about in http://tab.snarc.org/posts/haskell/2011-11-15-lookup-tables.html) instead of digitToInt.

Contributor

vincenthz commented Jan 24, 2012

@erikd: it would probably go even faster to use a lookup table just like the one i put in base16-bytestring (or i've talked about in http://tab.snarc.org/posts/haskell/2011-11-15-lookup-tables.html) instead of digitToInt.

@erikd

This comment has been minimized.

Show comment Hide comment
@erikd

erikd Jan 24, 2012

Contributor

Michael,

I've updated this commit to include the test program I used to benchmark and QuickCheck the replacement for readInt as well as the new faster version of that function.

I actually posted the incorrect values for readIntOrig last time. For my last tests, readIntOrig returned Int, while the one in Warp actually returns Integer. My new generic readInt drops the first non-digit and everything after and according to Criterion is an order of magnitude faster than readIntOrig regardless of whether it returns Int, Int64 or Integer. I think this is something to do with readIntOrig having fromIntegral inside the foldl'.

The file test/readInt.hs is so that anyone else who wants to look at this function has a starting point.

Cheers,
Erik

Contributor

erikd commented Jan 24, 2012

Michael,

I've updated this commit to include the test program I used to benchmark and QuickCheck the replacement for readInt as well as the new faster version of that function.

I actually posted the incorrect values for readIntOrig last time. For my last tests, readIntOrig returned Int, while the one in Warp actually returns Integer. My new generic readInt drops the first non-digit and everything after and according to Criterion is an order of magnitude faster than readIntOrig regardless of whether it returns Int, Int64 or Integer. I think this is something to do with readIntOrig having fromIntegral inside the foldl'.

The file test/readInt.hs is so that anyone else who wants to look at this function has a starting point.

Cheers,
Erik

@erikd

This comment has been minimized.

Show comment Hide comment
@erikd

erikd Jan 24, 2012

Contributor

@vincenthz Thanks for the pointer.

Wow, I'm now playing with something based on MagicHash that seems to be faster again.

@snoyberg Let me update this patch again.

Contributor

erikd commented Jan 24, 2012

@vincenthz Thanks for the pointer.

Wow, I'm now playing with something based on MagicHash that seems to be faster again.

@snoyberg Let me update this patch again.

@erikd

This comment has been minimized.

Show comment Hide comment
@erikd

erikd Jan 24, 2012

Contributor

@snoyberg I'm getting really consistent results like this:

Original version in Warp two days ago.

benchmarking readIntOrig
mean: 4.170688 us, lb 4.163166 us, ub 4.183716 us, ci 0.950
std dev: 49.46493 ns, lb 33.48951 ns, ub 89.92102 ns, ci 0.950

New MagicHash version, first returning just an Int64, second returning Integer:

benchmarking readInt64MH
mean: 225.4487 ns, lb 225.0661 ns, ub 225.9103 ns, ci 0.950
std dev: 2.147711 ns, lb 1.786974 ns, ub 2.789700 ns, ci 0.950

benchmarking readIntegerMH
mean: 259.0235 ns, lb 258.5696 ns, ub 259.5617 ns, ci 0.950
std dev: 2.536399 ns, lb 2.128291 ns, ub 3.138379 ns, ci 0.950

I'd be really curious to see if this has improved Warp performance :-)

Erik

Contributor

erikd commented Jan 24, 2012

@snoyberg I'm getting really consistent results like this:

Original version in Warp two days ago.

benchmarking readIntOrig
mean: 4.170688 us, lb 4.163166 us, ub 4.183716 us, ci 0.950
std dev: 49.46493 ns, lb 33.48951 ns, ub 89.92102 ns, ci 0.950

New MagicHash version, first returning just an Int64, second returning Integer:

benchmarking readInt64MH
mean: 225.4487 ns, lb 225.0661 ns, ub 225.9103 ns, ci 0.950
std dev: 2.147711 ns, lb 1.786974 ns, ub 2.789700 ns, ci 0.950

benchmarking readIntegerMH
mean: 259.0235 ns, lb 258.5696 ns, ub 259.5617 ns, ci 0.950
std dev: 2.536399 ns, lb 2.128291 ns, ub 3.138379 ns, ci 0.950

I'd be really curious to see if this has improved Warp performance :-)

Erik

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jan 24, 2012

Owner

Thanks for putting this together! I was just trying to run a Warp benchmark, when suddenly nothing worked anymore. After some investigation, it seems that a content-length of "5" is being interpreted as 48. Any idea why?

Owner

snoyberg commented Jan 24, 2012

Thanks for putting this together! I was just trying to run a Warp benchmark, when suddenly nothing worked anymore. After some investigation, it seems that a content-length of "5" is being interpreted as 48. Any idea why?

@erikd

This comment has been minimized.

Show comment Hide comment
@erikd

erikd Jan 24, 2012

Contributor

I'll check it out right now.

Contributor

erikd commented Jan 24, 2012

I'll check it out right now.

@dagit

This comment has been minimized.

Show comment Hide comment
@dagit

dagit Jan 24, 2012

Since you're already parsing from a ByteString, how does your implementation stack up against ByteString's implementation of readInt/readInteger?

http://hackage.haskell.org/packages/archive/bytestring/0.9.1.4/doc/html/Data-ByteString-Char8.html#v%3AreadInt

dagit commented Jan 24, 2012

Since you're already parsing from a ByteString, how does your implementation stack up against ByteString's implementation of readInt/readInteger?

http://hackage.haskell.org/packages/archive/bytestring/0.9.1.4/doc/html/Data-ByteString-Char8.html#v%3AreadInt

@erikd

This comment has been minimized.

Show comment Hide comment
@erikd

erikd Jan 24, 2012

Contributor

@dagit The ByteString version has a signature of:

readInteger :: ByteString -> Maybe (Integer, ByteString)

which means it needs to be used like:

bsreadInteger bs = fst $ fromMaybe (0, "") $ B.readInteger bs

which is 3 times slower than the MagicHash version (when that works correctly).

Contributor

erikd commented Jan 24, 2012

@dagit The ByteString version has a signature of:

readInteger :: ByteString -> Maybe (Integer, ByteString)

which means it needs to be used like:

bsreadInteger bs = fst $ fromMaybe (0, "") $ B.readInteger bs

which is 3 times slower than the MagicHash version (when that works correctly).

@dagit

This comment has been minimized.

Show comment Hide comment
@dagit

dagit Jan 24, 2012

@erikd Cool, so that means you should look at sending patches to bytestring when/if you get the magichash working.

dagit commented Jan 24, 2012

@erikd Cool, so that means you should look at sending patches to bytestring when/if you get the magichash working.

@erikd

This comment has been minimized.

Show comment Hide comment
@erikd

erikd Jan 24, 2012

Contributor

@snoyberg Found the problem.

There is some weird interaction between the CCP LANGUAGE pragma and the MagicHash stuff. Simply add the CPP pragma to tests/readInt.hs cause QC to fail on the first test.

I'll try breaking the readInt stuff out into its own file.

Contributor

erikd commented Jan 24, 2012

@snoyberg Found the problem.

There is some weird interaction between the CCP LANGUAGE pragma and the MagicHash stuff. Simply add the CPP pragma to tests/readInt.hs cause QC to fail on the first test.

I'll try breaking the readInt stuff out into its own file.

erikd added some commits Jan 24, 2012

warp : Move the MagicHash version of the readInt64 function to its ow…
…n file.

This was needed to prevent a nasty interaction (incorrect code being
generated) when the CPP and MagicHash pragmas were used in the same file.
@erikd

This comment has been minimized.

Show comment Hide comment
@erikd

erikd Jan 24, 2012

Contributor

@snoyberg This time for sure.

I broke readInt64 (which uses the MagicHash stuff) out into its own file, ReadInt.hs. That fixed the problem.

I also add a test for the ReadInt.readInt64 version of the function to test/readTest.hs. For some reason call it across boundaries makes it even faster!

Running the test/readInt.hs program, this is the MagicHash version in the same file as the tests:

benchmarking readIntegerMH
mean: 258.1115 ns, lb 257.8392 ns, ub 258.3972 ns, ci 0.950
std dev: 1.435718 ns, lb 1.228153 ns, ub 1.758441 ns, ci 0.950

and this is the one in ReadInt.hs:

benchmarking readIntWarp
mean: 214.7729 ns, lb 214.5187 ns, ub 215.0355 ns, ci 0.950
std dev: 1.321963 ns, lb 1.114365 ns, ub 1.632150 ns, ci 0.950

Crazy fast!

Contributor

erikd commented Jan 24, 2012

@snoyberg This time for sure.

I broke readInt64 (which uses the MagicHash stuff) out into its own file, ReadInt.hs. That fixed the problem.

I also add a test for the ReadInt.readInt64 version of the function to test/readTest.hs. For some reason call it across boundaries makes it even faster!

Running the test/readInt.hs program, this is the MagicHash version in the same file as the tests:

benchmarking readIntegerMH
mean: 258.1115 ns, lb 257.8392 ns, ub 258.3972 ns, ci 0.950
std dev: 1.435718 ns, lb 1.228153 ns, ub 1.758441 ns, ci 0.950

and this is the one in ReadInt.hs:

benchmarking readIntWarp
mean: 214.7729 ns, lb 214.5187 ns, ub 215.0355 ns, ci 0.950
std dev: 1.321963 ns, lb 1.114365 ns, ub 1.632150 ns, ci 0.950

Crazy fast!

@vincenthz

This comment has been minimized.

Show comment Hide comment
@vincenthz

vincenthz Jan 24, 2012

Contributor

warp speed ! :-)

Contributor

vincenthz commented Jan 24, 2012

warp speed ! :-)

@snoyberg snoyberg merged commit f4f26d2 into yesodweb:master Jan 24, 2012

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jan 24, 2012

Owner

Thank you Erik!

Owner

snoyberg commented Jan 24, 2012

Thank you Erik!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment