Finished

tibbe · Nov 30, 2011 · d616ca8 · d616ca8
1 parent 29e7371
commit d616ca8
Showing 1 changed file with 137 additions and 69 deletions.
diff --git a/stanford-2011/performance.md b/stanford-2011/performance.md
@@ -4,7 +4,7 @@
 
 * Software engineer at Google
 
-* Started using Haskell 2005.
+* Started using Haskell in 2005.
 
 * Worked on (among other things) GHC's new scalable I/O
   implementation.
@@ -134,7 +134,6 @@ an `Uno` takes 2 words, and a `Due` takes 3.
 * Exception: a constructor with no fields (like `Nothing` or `True`)
   takes no space, as it's shared among all uses.
 
-
 # Memory layout
 
 Here's how GHC represents the list `[1,2]` in memory:
@@ -238,11 +237,11 @@ When the pragma applies, it offers the following benefits:
 * Reduced memory usage (4 words in the case of
   `IntPair`)
 
-* Fewer indirections
+* Removes indirection
 
-Caveat: There are cases where unpacking hurts performance e.g. if the
-fields are passed to a non-strict function, as they need to be
-reboxed.
+Caveat: There are (rare) cases where unpacking hurts performance
+e.g. if the fields are passed to a non-strict function, as they need
+to be reboxed.
 
 **Unpacking is one of the most important optimizations available to
  us.**
@@ -329,7 +328,8 @@ Yes!  We can make use of the following:
 data List k v = Nil | Cons k v (List k v)
 ~~~~
 
-is more memory efficient than `[(k, v)]`.
+is more memory efficient than `[(k, v)]`, as the pair constructor has
+been unpacked into the `Cons` constructor.
 
 
 # An improved HashMap data type
@@ -340,7 +340,7 @@ data HashMap k v
           !(HashMap k v)
           !(HashMap k v)
     | Tip {-# UNPACK #-} !Hash
-          {-# UNPACK #-} !(FullList k v)
+          {-# UNPACK #-} !(FullList k v)  -- now monomorphic
     | Nil
 
 type SuffixMask = Int
@@ -350,6 +350,11 @@ data FullList k v = FL !k v !(List k v)
 data List k v = Nil | Cons !k v !(List k v)
 ~~~~
 
+* The `FullList` type has only one constructor, so it can be unpacked.
+
+* In the common case, the tail of the `FullList` is empty and thus
+  points to a shared `Nil` constructor.
+
 # Improved HashMap data type memory layout
 
 <p>![](hashmap.png)</p>
@@ -359,56 +364,6 @@ data List k v = Nil | Cons !k v !(List k v)
 In general: $5N + 4(N-1)$ words + size of keys & values
 
 
-# Memory footprints of some common data types
-
-Write this down on an index card and keep around for
-back-of-the-envelope calculations.
-
-<table>
-  <tbody><tr>
-    <th>Data type</th>
-    <th>Memory footprint</th>
-  </tr>
-  <tr>
-    <td><code>Data.ByteString</code></td>
-    <td>9 words + N bytes</td>
-  </tr>
-  <tr>
-    <td><code>Data.Text</code></td>
-    <td>6 words + 2N bytes</td>
-  </tr>
-  <tr>
-    <td><code>String</code></td>
-    <td>5N words</td>
-  </tr>
-  <tr>
-    <td><code>Data.Map</code></td>
-    <td>6N words + size of keys &amp; values</td>
-  </tr>
-  <tr>
-    <td><code>Data.Set</code></td>
-    <td>5N words + size of elements</td>
-  </tr>
-  <tr>
-    <td><code>Data.IntMap</code></td>
-    <td>3N + 5(N-1) words + size of values</td>
-  </tr>
-  <tr>
-    <td><code>Data.IntSet</code></td>
-    <td>2N + 5(N-1) words</td>
-  </tr>
-  <tr>
-    <td><code>Data.HashMap</code></td>
-    <td>5N + 4(N-1) words + size of keys &amp; values</td>
-  </tr>
-  <tr>
-    <td><code>Data.HashSet</code></td>
-    <td>5N + 4(N-1) words + size of elements</td>
-  </tr>
-</tbody></table>
-
-(Some caveates apply.)
-
 # Reasoning about laziness
 
 A function application is only evaluated if its result is needed,
@@ -419,8 +374,9 @@ therefore:
 * Any expression whose value is required to decide which RHS to
   evaluate, must be evaluated.
 
-By using this "back-to-front" analysis we can figure which arguments a
-function is strict in.
+These two properties allows us to use "back-to-front" analysis (known
+as demand/strictness analysis) to figure which arguments a function is
+strict in.
 
 
 # Reasoning about laziness: example
@@ -445,11 +401,11 @@ max x y
 ~~~~ {.haskell}
 data Tree = Leaf | Node Int Tree Tree
 
-insert :: Int -> Tree -> BST
+insert :: Int -> Tree -> Tree
 insert x Leaf   = Node x Leaf Leaf
-insert x (Node x' l r)
-    | x < x'    = Node x' (insert x l) r
-    | x > x'    = Node x' l (insert x r)
+insert x (Node y l r)
+    | x < y     = Node y (insert x l) r
+    | x > y     = Node y l (insert x r)
     | otherwise = Node x l r
 ~~~~
 
@@ -482,23 +438,135 @@ insert (1 `div` 0) (Node 2 Leaf Leaf)
 does.
 
 
+# Strictness annotations in the real world
+
+~~~~ {.haskell}
+delete :: (Eq k, Hashable k) => k -> HashMap k v -> HashMap k v
+delete k0 = go h0 k0
+  where
+    h0 = hash k0
+    go h !k t@(Bin sm l r)
+      | nomatch h sm = t
+      | zero h sm    = bin sm (go h k l) r
+      | otherwise    = bin sm l (go h k r)
+    go h k t@(Tip h' l)
+      | h == h'      = case FL.delete k l of
+          Nothing -> Nil
+          Just l' -> Tip h' l'
+      | otherwise    = t
+    go _ _ Nil         = Nil
+{-# INLINABLE delete #-}
+~~~~
+
+* Without the bang pattern, `go` is not strict in the key `k` (why?).
+
+* Making `go` strict in the key argument allows GHC to unbox these
+  values in the loop (after `delete` has been inlined and the key type
+  is known).
+
+
 # Benchmark
 
-So, is `HashMap` faster than `Map`?
+So, is `HashMap` faster than `Map`?  Benchmark: $2^12$ random
+`ByteString` keys of length 8
+
+~~~~
+benchmarking Map/lookup/ByteString
+mean: 1.590200 ms, lb 1.584947 ms, ub 1.597055 ms, ci 0.950
+std dev: 30.69466 us, lb 24.94621 us, ub 39.41903 us, ci 0.950
+
+benchmarking Map/insert/ByteString
+mean: 2.957678 ms, lb 2.891839 ms, ub 3.086595 ms, ci 0.950
+std dev: 451.8105 us, lb 261.2515 us, ub 688.6132 us, ci 0.950
+
+benchmarking lookup/ByteString
+mean: 575.9371 us, lb 574.3850 us, ub 577.8249 us, ci 0.950
+std dev: 8.790398 us, lb 7.440493 us, ub 10.91274 us, ci 0.950
+
+benchmarking insert/ByteString
+mean: 1.506817 ms, lb 1.451558 ms, ub 1.569359 ms, ci 0.950
+std dev: 301.2400 us, lb 270.4358 us, ub 324.0515 us, ci 0.950
+~~~~
+
+Yes!
 
 
 # Memory usage
-Total: 96 MB, tree: 66MB ($2^{20}$ `Int` entries)
-\begin{center}
-\includegraphics[angle=90,scale=0.3]{patricia-mem.pdf}
-\end{center}
 
+Benchmark: $2^20$ key-value pairs of type `Int`, on a 64-bit machine
+
+Estimated: $8 * (5N + 4(N-1) + 4N)$ = 104 MB
+
+Real:
+
+~~~~
+   716,158,856 bytes allocated in the heap
+ 1,218,205,432 bytes copied during GC
+   106,570,936 bytes maximum residency (16 sample(s))
+     3,636,304 bytes maximum slop
+           269 MB total memory in use (0 MB lost due to fragmentation)
+~~~~
+
+Maximum residency is the number we care about.
 
 # Summary
 
 * When working on performance critical code, focus on memory layout
   first, micro optimzations second (just like in any other language).
 
+* Strictness annotations are mainly used on loop variables and in data
+  type definitions.
+
 * `Data.HashMap` is implemented in the unordered-containers package.
   You can get the source from
   [http://github.com/tibbe/unordered-containers](http://github.com/tibbe/unordered-containers)
+
+# Bonus: memory footprint of some common data types
+
+Write this down on an index card and keep around for
+back-of-the-envelope calculations.
+
+<table>
+  <tbody><tr>
+    <th>Data type</th>
+    <th>Memory footprint</th>
+  </tr>
+  <tr>
+    <td><code>Data.ByteString</code></td>
+    <td>9 words + N bytes</td>
+  </tr>
+  <tr>
+    <td><code>Data.Text</code></td>
+    <td>6 words + 2N bytes</td>
+  </tr>
+  <tr>
+    <td><code>String</code></td>
+    <td>5N words</td>
+  </tr>
+  <tr>
+    <td><code>Data.Map</code></td>
+    <td>6N words + size of keys &amp; values</td>
+  </tr>
+  <tr>
+    <td><code>Data.Set</code></td>
+    <td>5N words + size of elements</td>
+  </tr>
+  <tr>
+    <td><code>Data.IntMap</code></td>
+    <td>3N + 5(N-1) words + size of values</td>
+  </tr>
+  <tr>
+    <td><code>Data.IntSet</code></td>
+    <td>2N + 5(N-1) words</td>
+  </tr>
+  <tr>
+    <td><code>Data.HashMap</code></td>
+    <td>5N + 4(N-1) words + size of keys &amp; values</td>
+  </tr>
+  <tr>
+    <td><code>Data.HashSet</code></td>
+    <td>5N + 4(N-1) words + size of elements</td>
+  </tr>
+</tbody></table>
+
+(Some caveates apply.)