Skip to content
This repository

session slowness #415

Closed
gregwebs opened this Issue September 01, 2012 · 18 comments

5 participants

Greg Weber Michael Snoyman Lorenzo Bolla Felipe Lessa Kazu Yamamoto
Greg Weber
Owner

Lorenzo Bolla ✆ lbolla@gmail.com
1:37 AM (5 hours ago)

Warp
10250 req/s

Yesod
hamlet: 848 req/s
no-hamlet: 940 req/s (getHomeR = return ∘ RepPlain ∘ toContent $ "Hello World!")
no-session: 8000 req/s (makeSessionBackend = const $ return Nothing)

Lorenzo Bolla

Code to reproduce the issue is here:
https://gist.github.com/3567006

See different revisions for the different configurations.

Felipe Lessa
Owner

Links to relevant code on master:

Felipe Lessa
Owner

It seems you forgot to compile with -threaded.

On my machine, here's what I get compiling with -threaded, running with +RTS -N4 (I have six cores) and --num-calls=1000:

vanilla no-session:   41,485 req/s (24 us/req)
vanilla with-session:  3,251 req/s (307 us/req)
noencrypt:             3,678 req/s (264 us/req)

vanilla is yesod-core from Hackage while noencrypt is yesod-core changing CS.encrypt to Base64.encode and CS.decrypt to Base64.decode.

If I didn't mess anything up, then this means that time spent on vanilla with-session is composed of

Time to answer the request:       24 us
Time to encode/decode session:   240 us +
Time to encrypt/decrypt session:  43 us
                                 --------
                                 307 us

So the performance hog may not be clientsession (which was already benchmarked and is plenty fast) but the serialization (perhaps serialization of UTCTime?). More tests are needed.

Felipe Lessa
Owner

Here's the profile for vanilla with-session: http://hpaste.org/74032. I've added SCCs to yesod-core as below:

encodeClientSession key iv expire rhost session' =
    ({-# SCC "encodeClientSession/encrypt" #-} CS.encrypt key iv $
    ({-# SCC "encodeClientSession/encode" #-} encode $ SessionCookie expire rhost session'))

First of all, decodeClientSession is never called AFAICS, so httperf probably isn't passing the cookies along.

Now, taking a look at the inherited columns, encodeClientSession/encrypt is taking 18.3% of all program time. By itself (individual columns), it's taking 4.7%. Its interesting children are Data.ByteString.Base64.encode, which is taking 5.4% (inherited), and encodeClientSession/encode, which is taking 7.6% (inherited) and is composed of just Data.Serialize.encode. Those three sum to 17.7%.

In other words, encodeClientSession seems to be the problem (we already knew that) but clientsession itself is just the smallest of the problems: we need to tackle Base64.encode and Data.Serialize.encode first. Besides, it's going to be very hard to extract more performance from clientsession (most of its time is spent on either AES or Skein).

Please, someone correct me if I'm wrong, I'm not a great profiler =).

Felipe Lessa
Owner

Here's another profile: http://hpaste.org/74041. This time, besides the SCCs from before, I've also added:

instance Serialize SessionCookie where
    put (SessionCookie a b c) =
      ({-# SCC "SessionCookie/put/1" #-} putTime a) >>
      ({-# SCC "SessionCookie/put/2" #-} put b) >>
      ({-# SCC "SessionCookie/put/3" #-} put (map (first unpack) c))

The inherited times I get for those three are:

SessionCookie/put/1: 2.8%
SessionCookie/put/2: 0.3%
SessionCookie/put/3: 1.3%

Note that putTime, which just serializes a simple UTCTime, is taking 2.8% of all program time, more than half of the time clientsession takes to encrypt and hash the data =). Looks like a nice optimization target.

The third put takes some time but it's going to be a lot harder to improve its performance, since it's basically a list of pairs ByteStrings.

Felipe Lessa
Owner

I'm going to take a break for now. If anyone is feeling adventurous, try optimizing the putTime/getTime pair.

Lorenzo Bolla

Nice job. putTime and getTime seem to convert UTC to JulianDates back and forth. They both rely on Data.Time.Clock.POSIX, so maybe we should store in the session cookie the seconds from Epoch rather than Julian dates or UTCtimes, and avoid expensive calls to diffUTCTime?

Felipe Lessa
Owner

@lbolla I'm going to try doing that right now. Setting up some criterion benchmarks...

Greg Weber
Owner

@kazu-yamamoto has worked on date optimization for the logger, but that was mostly around formatting

Felipe Lessa
Owner

Here's the criterion benchmark and the results from running it on my machine: http://hpaste.org/74052.

Using POSIXTime, we can get a 47% faster putTime and a 60% faster getTime (posix_int64).

Using hand written functions, we can get a 74% faster putTime and a 98% (!!) faster getTime (byhand_b). More specifically, putTime is going from 5.22 us to 1.35 us. I don't have any more ideas of optimizing putTime.

Also, on my test the original putTime_orig used 25 bytes, while putTime_byhand_b used only 8 bytes. That means that we are going to be able to save a colossal amount of 23 bytes on the base64 output! (putTime_posix_integer was even better, just 5 bytes, but putTime_byhand_b is faster and those are just 3 bytes).

Felipe Lessa
Owner

Using toModifiedJulianDay/ModifiedJulianDay instead of addDays/diffDays does shave some nanoseconds off those timings (http://hpaste.org/74055), but I guess that most of the time is spent truncating the time of the day (and that's why putTime is so slower than getTime. It's stored as a Data.Fixed.Fixed, and its truncate implementation goes through Rational.

I won't be able to commit these new implementations onto yesod-core right now, though.

Kazu Yamamoto
Collaborator

I don't understand the problem well.
But if you want fast formatter of time for HTTP Date (GMT only), use http-date.
If you want fast formatter of time for zoned date (e.g. JST), use unix-time.
If you run Yesod on Linux 2, gettimeofday() is a system call which is very slow.
In this case, you should consider to use date-cache.

Felipe Lessa
Owner

Implemented newer functions on #418. My benchmarks suggest only a 5% to 17% performance increase, though, see b2a9beb for details.

Felipe Lessa
Owner

We still need more profiling =).

Greg Weber
Owner

I saw some commits recently to improve session performance. What is the status of this ticket?

Michael Snoyman
Owner

I merged in @meteficha's code, though for the moment I had to comment out some of the bigger performance savings for backwards compatibility. Once we have a major version bump, I'll switch to the high-speed version.

I have no more work planned for this issue, I'm OK closing it if no one has anything else to add.

Felipe Lessa
Owner

The benchmarks should be redone now that clientsession uses cipher-aes directly, it should be better at least in theory =).

Michael Snoyman
Owner

@meteficha's code is now used exclusively on the yesod1.2 branch.

Michael Snoyman snoyberg closed this March 10, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.