UTCTime serialisation slow #51

teh · 2016-04-30T17:43:08Z

For testing I replaced my Binary instances with Serialise from this package and serialization performance was 10-20x slower. I narrowed it down to UTCTime. Replacing that with a dummy empty serialiser speeds up my code from 3 seconds to 150 milliseconds.

Instance in question:

https://github.com/well-typed/binary-serialise-cbor/blob/9528877a4d85642be787c05efee39d2b3e0e078e/Data/Binary/Serialise/CBOR/Class.hs#L402

The text was updated successfully, but these errors were encountered:

teh · 2016-04-30T19:04:21Z

The actual format is part of the standard: https://tools.ietf.org/html/rfc7049#section-2.4.1 so there's probably no wiggle room in changing that to the nicer UTCTime day/time format. I tried writing a faster encoder but time conversions are slow. Will probably have to newtype with custom serialise :/

arianvp · 2016-05-01T10:05:49Z

You could newtype it and use the same serialisation format as the binary package?

thoughtpolice · 2016-05-01T14:13:58Z

It'd be interesting to see where you got your Binary instances from - you took an Orphan I imagine (or newtyped it) something like this I assume?

instance Binary UTCTime where
  put (UTCTime a b) = put a >> put b
  get = UTCTime <$> get <*> get

instance Binary Day where
  put (ModifiedJulianDay d) = put d
  get = ModifiedJulianDay <$> get

instance Binary DiffTime where
  put = put . fromEnum
  get = toEnum <$> get

...

or whatever? I can definitely believe that the time formatting functions are simply slow no matter what, since they have to roundtrip through String and probably a trillion other things.

Second, if you can file a synthetic benchmark or something with our criterion setup, that'd be really nice! It'd be interesting to investigate this (it's a bit hard to say exactly what could be improved without looking at a profile).

If this is an insurmountable problem, it would probably be OK to provide another function like reallyFastUTCEncoding :: UTCTime -> Encoding that does something similar, and just mention it very prominently in the tutorial and documentation that any time a UTCTime is needed, you should carefully consider which one you need. This is a really horrible workaround, though, and gets very unusable if you ever want some function overloaded on Serialise (you can't just call encode, so you'd have to take an Encoding instead or something). FWIW: A literal translation of any binary instance should always be faster, basically without exception - but this is a case where one of our instances has to be different from upstream for the stated behavior.

That said, providing the canonical instance is, IMO correct (we don't want orphans for this). It sucks that it's so slow, though.

teh · 2016-05-01T21:54:09Z

@thoughtpolice - yea, that's pretty close: https://hackage.haskell.org/package/binary-orphans-0.1.4.0/docs/src/Data.Binary.Orphans.html

I'm trying to add a UTCTime benchmark but can't get the benchmark to compile. GHC (7.10.3) has been stuck for over an hour now compiling PkgAesonGeneric - have you seen this?

[16 of 28] Compiling Macro.PkgAesonGeneric ( bench/Macro/PkgAesonGeneric.hs, dist/build/vs-other-libs/vs-other-libs-tmp/Macro/PkgAesonGeneric.o )

23247 me       20   0 4878848 3.542g  22980 S 310.3 46.1  15:06.19 ghc

teh · 2016-05-01T22:10:45Z

Removed that benchmark for now while testing. See #52. I'm totally unclear on whether that benchmark actually does what I want (with force) ..

thoughtpolice · 2016-05-02T01:47:45Z

The benchmark compilation performance is a known problem - see #33.

I'll take a closer look at #52. Ugh, I need to get our 32 bit support online...

thoughtpolice · 2016-05-02T01:48:16Z

Oh yes, and Generic might be especially bad due to some GHC bugs. I'll test for a bit.

dcoutts · 2016-05-02T15:20:50Z

Sigh. It's unfortunate that it's so slow. So we have two choices:

use the standard CBOR representation for greater compatibility (e.g.cbor2json tools will know how to render them) and try and make the parser/printer faster, or
go for a different representation in terms of the day + time of day. Of course we only have to handle the one standard date/time format, unlike the time lib with its general format string interpretation.

adamgundry · 2016-05-03T08:47:17Z

I suspect this is a problem for our application as well, which serializes large datasets full of UTCTimes. Would it be worth working on a higher-performance RFC 3339 parser/printer (if such a thing doesn't exist already)? That might be a nice little project that could be useful in other contexts.

Alternatively, I notice that the CBOR standard permits POSIX-style timestamps as well (tag 1). Though those don't seem to be supported by the Serialise UTCTime instance...

adamgundry · 2016-05-03T09:01:16Z

Looks like aeson might have some useful prior art.

See issue #51. Closes #52. Signed-off-by: Austin Seipp <austin@well-typed.com>

thoughtpolice · 2016-05-11T23:43:25Z

Right, so Aeson uses ISO-8601 for encoding UTCTime, which is also exactly what CBOR uses. The writer bit should be pretty easy to steal, since it already uses a Builder to write out the encoded format. But the parser will be tricker as we'll have to eliminate the attoparsec dependency, I think. But the initial results of reusing that code seem promising at a glance.

There's no support for writing these (the default UTCTime instance uses high-precision String representation, which is slow but can likely be improved -- see #51), but reading them is a nice interop feature. Fixes #61. Closes #68. Signed-off-by: Austin Seipp <austin@well-typed.com>

bgamari · 2017-02-09T17:47:10Z

We are going to move to a new, more reasonable encoding for the UTCTime instance for the release. However, this can be done in a backwards compatible way as it is tagged.

One possible encoding is two integers: days since the 0 epoch and picoseconds.

thoughtpolice · 2017-04-06T17:36:00Z

It looks like this will go into 0.2 regardless at this rate, so I'm queuing this for the initial release.

bgamari · 2017-06-13T14:16:55Z

I have submitted an IANA request for the new tags but it will take some time for the process to take it course. I don't think we should hold the release for this.

thoughtpolice · 2017-06-15T00:32:38Z

Fixed by d093bb6.

thoughtpolice · 2017-06-15T00:36:32Z

Errr, actually fixed by cad4c72! (Ben accidentally force pushed over the last commit)

thoughtpolice added P-medium E-hard A-perf E-unknown and removed E-hard labels May 1, 2016

teh added a commit to teh/binary-serialise-cbor that referenced this issue May 1, 2016

Add a benchmark for UTCTime. See well-typed#51.

2b7f783

thoughtpolice pushed a commit that referenced this issue May 4, 2016

bench: add some instance benchmarks for UTCTime

54a07d8

See issue #51. Closes #52. Signed-off-by: Austin Seipp <austin@well-typed.com>

thoughtpolice mentioned this issue May 11, 2016

UTCTime tag 1 is not handled #61

Closed

dcoutts added the E-helpwanted label Jun 21, 2016

jberryman mentioned this issue Jun 29, 2016

Generic default instances appear to be not significantly faster than cereal #86

Closed

bgamari added the Release blocker label Feb 9, 2017

thoughtpolice removed the Release blocker label Apr 6, 2017

thoughtpolice added this to the 0.2.0.0 - Initial release milestone Apr 6, 2017

bgamari removed this from the 0.2.0.0 - Initial release milestone Jun 13, 2017

thoughtpolice closed this as completed Jun 15, 2017

thoughtpolice added this to the 0.2.0.0 - Initial release milestone Jun 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTCTime serialisation slow #51

UTCTime serialisation slow #51

teh commented Apr 30, 2016

teh commented Apr 30, 2016

arianvp commented May 1, 2016

thoughtpolice commented May 1, 2016 •

edited

teh commented May 1, 2016 •

edited

teh commented May 1, 2016

thoughtpolice commented May 2, 2016

thoughtpolice commented May 2, 2016

dcoutts commented May 2, 2016 •

edited

adamgundry commented May 3, 2016

adamgundry commented May 3, 2016

thoughtpolice commented May 11, 2016

bgamari commented Feb 9, 2017 •

edited

thoughtpolice commented Apr 6, 2017

bgamari commented Jun 13, 2017

thoughtpolice commented Jun 15, 2017

thoughtpolice commented Jun 15, 2017

UTCTime serialisation slow #51

UTCTime serialisation slow #51

Comments

teh commented Apr 30, 2016

teh commented Apr 30, 2016

arianvp commented May 1, 2016

thoughtpolice commented May 1, 2016 • edited

teh commented May 1, 2016 • edited

teh commented May 1, 2016

thoughtpolice commented May 2, 2016

thoughtpolice commented May 2, 2016

dcoutts commented May 2, 2016 • edited

adamgundry commented May 3, 2016

adamgundry commented May 3, 2016

thoughtpolice commented May 11, 2016

bgamari commented Feb 9, 2017 • edited

thoughtpolice commented Apr 6, 2017

bgamari commented Jun 13, 2017

thoughtpolice commented Jun 15, 2017

thoughtpolice commented Jun 15, 2017

thoughtpolice commented May 1, 2016 •

edited

teh commented May 1, 2016 •

edited

dcoutts commented May 2, 2016 •

edited

bgamari commented Feb 9, 2017 •

edited