-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy path08_lisp.html
474 lines (381 loc) · 17.9 KB
/
08_lisp.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width">
<title>8. Lisp-like lists</title>
<link rel="stylesheet" href="template/style.css">
</head>
<body>
<span style="float: right">
[
<a href="07_min_range.html">previous</a>,
<a href="index.html">contents</a>,
<a href="09_iterators.html">next</a>
]
</span>
<a name="L8.-Lisp-2d-like-lists"></a>
<h1>8. Lisp-like lists</h1>
<a name="Lists-in-lisp-and-Scheme"></a>
<h2>Lists in lisp and Scheme</h2>
<p>A long time ago there was a programming language called <a href="https://en.wikipedia.org/wiki/Lisp_(programming_language)">Lisp</a>
or for you younger folks <a href="https://en.wikipedia.org/wiki/Scheme_(programming_language)">Scheme</a>.
Scheme might have been wrong, but it was great<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>.
The whole language centers around very simple <a href="https://en.wikipedia.org/wiki/Linked_list">linked lists</a>
which are based on three fundamental operations<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>:</p>
<ol>
<li><a href="http://www.lispworks.com/documentation/lw50/CLHS/Body/f_cons.htm"><code>cons</code></a>: create a pair.</li>
<li><a href="http://clhs.lisp.se/Body/f_car_c.htm#car"><code>car</code></a>: get first element of pair.</li>
<li><a href="http://clhs.lisp.se/Body/f_car_c.htm#cdr"><code>cdr</code></a>: get second element of pair.</li>
</ol>
<p>We’re not going to use these terms and
we’re going to extend our vocabulary from 3 to 4.
Lisp told us that there’s this wonderful thing, without which we cannot live, called
<a href="https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)">garbage collection</a>.
We don’t want garbage collection for
all the algorithms we want to use.
So we are going to add a 4th operation:</p>
<p> 4. <code>free</code>: manually release/free a pair.</p>
<p>What we want to do is muck around with lists.
Meaning you can insert items in the middle, change pointers, connect this and that.
All of these operations are natural.
If you don’t want to muck around, just use vectors.</p>
<p>But, we’re going to build it so it’s blindingly fast.
How are we going to do that?
You want to avoid memory fragmentation.
If you have lists
with nodes spread all over memory, every time
you access one, it is a cache miss.
Modern computer caches do not really help if you do long jumps.
We have lots of nodes,
but we want them to live in a little buffer even if we keep generating them back and forth.
If they reside in a small space we will never get a cache miss.</p>
<a name="Why-is-malloc-so-slow-3f-"></a>
<h3>Why is malloc so slow?</h3>
<p>We’re also going to avoid <a href="https://man7.org/linux/man-pages/man3/malloc.3.html"><code>malloc</code></a>
because it is evil.
It used to be sort of alright, when I started working on STL in 1993.
But, even then I realized it was too slow to be used with node
based data structures.
So for any data structure of nodes, such as list, I would keep a pool of
nodes myself and manage them in a quick way.</p>
<p>A few people, such as <a href="https://en.wikipedia.org/wiki/P._J._Plauger">Bill Plauger</a> at Microsoft
and others at <a href="https://www.gnu.org/">GNU</a> who followed their example, said that if they have a common
pool and they just do pointer movement then if you have multiple threads you
could have problems<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup>.
Instead of solving the problem for the multi-threaded case
they decided to solve it in general.
They said, “first we’re going to put locks on our malloc<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup>.
Then we’re going to throw Alex’s pool management away
and we’re going to do full malloc.”
Now malloc is function call with a lock, so it’s a very heavy operation.</p>
<p>Because of this decision, all our lists are going to be thread safe<sup id="fnref:5"><a href="#fn:5" rel="footnote">5</a></sup>.
People like us, who do not use threads (you don’t use threads right?) pay for them.
They violated a fundamental
principle which Bjarne insists on.
<em>People should not pay for things they do not use</em>.
Everybody pays for the ability of multiple threads to do list allocations out of the
same pool. Which actually nobody does, but everybody pays.</p>
<a name="List-pool"></a>
<h2>List pool</h2>
<p>A list pool is an object
with many outstanding lists inside.
Internally we will use one vector to implement many, many, lists.
These lists are not containers.
A container guarantees that when the container is gone, the values are gone too.
For these lists there is no guarantee like that.
For example, you could split this list
into two by setting a <code>cdr</code>.
There is no ownership and this is why I recommend not viewing them as containers.
STL containers are wonderful when you want them, but that’s not the case here.</p>
<p>We’re trying to get as close to Lisp as we can without building garbage
collection<sup id="fnref:6"><a href="#fn:6" rel="footnote">6</a></sup>. If you want to build garbage collection you can extend
this thing and build garbage collection too, but garbage collection
is overrated.</p>
<p>We will implement <code>list_pool</code> as a class, with two types as template arguments.
<code>T</code> will be the values we want to store,
and <code>N</code> will be an index type.</p>
<pre><code>#include <vector>
template<typename T, typename N>
// T is semi-regular.
// N is integral
class list_pool {
typedef N list_type;
struct node_t {
T value;
N next;
};
std::vector<node_t> pool;
list_type free_list;
// ...
};
</code></pre>
<p>What should <code>N</code> be? Why not <code>size_t</code>?
Because it’s 64 bits.
For our application we could probably
use <code>uint16_t</code> so our whole node fits in 32 bits.
But, we should define a default.</p>
<pre><code>typename N = size_t;
</code></pre>
<p>Now we are going to implement <code>cons</code>, <code>car</code>, <code>cdr</code>, and <code>free</code> as member
functions of <code>list_pool</code>, but we need appropriate names for a younger generation.</p>
<a name="Value--28-car-29-"></a>
<h3>Value (car)</h3>
<p>We will rename <code>car</code> to <code>value</code>.
Note that because we can return <code>value</code> by reference,
it can be both read and modified.
So, it won’t just be <code>car</code>, it will
also act as <a href="http://clhs.lisp.se/Body/f_rplaca.htm"><code>rplaca</code></a><sup id="fnref:7"><a href="#fn:7" rel="footnote">7</a></sup>.</p>
<pre><code>T& value(list_type x) {
return node(x).value;
}
const T& value(list_type x) const {
return node(x).value;
}
</code></pre>
<a name="Next--28-cdr-29-"></a>
<h3>Next (cdr)</h3>
<p>Let’s rename <code>cdr</code> to <code>next</code>.
Because of read and write it also acts as <a href="http://clhs.lisp.se/Body/f_rplaca.htm"><code>rplacd</code></a>.</p>
<pre><code>list_type& next(list_type x) {
return node(x).next;
}
const list_type& next(list_type x) const {
return node(x).next;
}
</code></pre>
<a name="Free"></a>
<h3>Free</h3>
<p>Now let’s write <code>free</code>.
The pool maintains a list of nodes which are available for reuse.
This operation appends a node to the head of this list<sup id="fnref:8"><a href="#fn:8" rel="footnote">8</a></sup>.</p>
<pre><code>list_type free(list_type x) {
list_type cdr = next(x);
next(x) = free_list;
free_list = x;
return cdr;
}
</code></pre>
<p>We make it somewhat more useful by returning <code>next(x)</code> instead of void.
If it was not returned, the user would have to save it before freeing.</p>
<a name="Allocate--28-cons-29-"></a>
<h3>Allocate (cons)</h3>
<p>Now we will write <code>cons</code>, it takes two arguments.
Where do nodes come from?
The free list, if it has room,
otherwise we make a new node from the pool.</p>
<pre><code>list_type allocate(const T& val, list_type tail) {
list_type new_list = free_list;
if (is_empty(free_list)) {
new_list = new_node();
} else {
free_list = next(free_list);
}
// start with this part
value(new_list) = val;
next(new_list) = tail;
return new_list;
}
</code></pre>
<p>So we need to write the public function <code>is_empty</code>.</p>
<pre><code>bool is_empty(list_type x) const {
return x == empty();
}
</code></pre>
<p>Dual to this function, is one which gives you the <code>nil</code> or empty list.</p>
<pre><code>list_type empty() {
return list_type(0);
}
</code></pre>
<p>You might think, what about the <code>0</code>th item in the pool?
We will just index everything at <code>1</code>, so we don’t lose
the first item.
If you use <code>-1</code> then our index type must be signed.</p>
<pre><code>list_pool() {
free_list = empty();
}
</code></pre>
<p>Now we write a few private node functions including <code>new_node</code>.</p>
<pre><code>node_t& node(list_type x) {
return pool[x - 1];
}
const node_t& node(list_type x) const {
return pool[x - 1];
}
list_type new_node() {
pool.push_back(node_t());
return list_type(pool.size());
}
</code></pre>
<p>Our structure requires all lists in the pool to be <code>const</code>
or all of them to be non-<code>const</code>.
Typically <code>const</code> is just for handing someone something
to read.</p>
<a name="Free-list-helper"></a>
<h3>Free list helper</h3>
<p>There is a simple rule to distinguish when
you should write a method/member function
and when to just make an outside function (free function).
Implement the simplest possible thing.
If you can do it outside, do it.</p>
<p>Let’s implement a function for freeing an entire list,
not just a node.</p>
<pre><code>template <typename T, typename N>
void free_list(list_pool<T, N>& pool,
typename list_pool<T, N>::list_type x) {
while (!pool.is_empty(x)) x = pool.free(x);
}
</code></pre>
<p><strong>Exercise:</strong> Before moving on, get familiar with these operations.
Create a simple list inside a pool and print it by iterating through its contents
(solved in <code>test_list_pool.cpp</code> at the end of the chapter).</p>
<a name="List-queue"></a>
<h2>List queue</h2>
<p>We can use our list to implement a queue structure.
The queue will be defined by a list node
in the front, and one in the back.</p>
<pre><code>typedef std::pair<list_type, list_type> pair_type;
</code></pre>
<p>We often want to detect empty queues
and construct them:</p>
<pre><code>bool empty(const pair_type& p) { return is_end(p.first); }
pair_type empty_queue() { return pair_type(end(), end()); }
</code></pre>
<p>You can add an element to the front, or the back of the queue:</p>
<pre><code>pair_type push_front(const pair_type& p, const T& value) {
list_type new_node = allocate(value, p.first);
if (empty(p)) return pair_type(new_node, new_node);
return pair_type(new_node, p.second);
}
pair_type push_back(const pair_type& p, const T& value) {
list_type new_node = allocate(value, end());
if (empty(p)) return pair_type(new_node, new_node);
next(p.second) = new_node;
return pair_type(p.first, new_node);
}
</code></pre>
<p>You can remove an element from the front of the queue<sup id="fnref:9"><a href="#fn:9" rel="footnote">9</a></sup>:</p>
<pre><code>pair_type pop_front(const pair_type& p) {
if (empty(p)) return p;
return pair_type(next(p.first), p.second);
}
</code></pre>
<p>Now we can also free lists in constant time,
simply by attaching the end of our list to the free list.</p>
<pre><code>void free(const pair_type& p) { free(p.first, p.second); }
</code></pre>
<a name="Code"></a>
<h2>Code</h2>
<ul>
<li><a href="code/list_pool.h">list_pool.h</a></li>
<li><a href="code/list_pool_iterator.h">list_pool_iterator.h</a> (included by <code>list_pool.h</code>, but not discussed until Chapter 9)</li>
<li><a href="code/test_list_pool.cpp">test_list_pool.cpp</a></li>
</ul>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
Alex: I’m talking to an apparently non-existent Lisp community
because MIT is just a Python school now
(see <a href="http://lambda-the-ultimate.org/node/5335">“Programming by poking: why MIT stopped teaching SICP”</a>).<a href="#fnref:1" rev="footnote">↩</a></li>
<li id="fn:2">
<p>Alex calls these “lists” without much explanation.
In Lisp all lists are built out of pairs.
In each pair, the first element (called the <code>car</code>) is the value of the list at this point.
The second element (called the <code>cdr</code>) is a pointer to another pair, or <code>nil</code>.
<code>nil</code> terminates the list.</p>
<p>For example,
if we write a pair as <code>(car . cdr)</code> with <code>.</code> deliminating the <code>car</code> and <code>cdr</code>,
the list <code>1 2 3</code> can be constructed from three pairs:</p>
<pre><code>(1 . (2 . (3 . nil)))
</code></pre>
<p>See <a href="https://mitpress.mit.edu/sites/default/files/sicp/full-text/book/book-Z-H-15.html#%_sec_2.2">chapter 2.2</a> of “Structure and Interpretation of Computer Programs”
for a thorough introduction to Lisp lists.</p>
<p><code>car</code> and <code>cdr</code> are commonly called <code>head</code> and <code>tail</code> in other functional languages.
Their names are historical artifacts of the hardware that early Lisp implementations used
(see <a href="https://en.wikipedia.org/wiki/CAR_and_CDR">“CAR and CDR Wikipedia page”</a> or “Lisp 1.5 Programmer’s Manual”).</p><a href="#fnref:2" rev="footnote">↩</a></li>
<li id="fn:3">
All kinds of problems can arise from two threads modifying the same resource.
When code executes concurrently, it’s much more difficult to reason about control flow.
One line does not immediately follow the other,
so things can be overwritten or messed up in between statements.
Another problem is called a <a href="https://en.wikipedia.org/wiki/Race_condition">race condition</a>.
This is when a piece of code relies on one thread doing a task before another.<a href="#fnref:3" rev="footnote">↩</a></li>
<li id="fn:4">
<p><a href="https://en.wikipedia.org/wiki/Lock_(computer_science)">Locks</a> (often called mutexes in programming)
are a mechanism for controlling access to a shared resource.
To prevent multiple threads from running over each other,
a lock ensures that only one thread can access or modify
a shared resource at a time.
Designing such a mechanism well is actually fairly difficult.
(See “The Art of Multiprocesser Programming” by Herlihy and Shavit.)
Locks tend to be slow because they pause threads until they are safe to proceed.
In addition they usually communicate with the kernel.</p>
<p>Many programming frameworks in the late 90s and early 2000s (especially Java and C#)
decided that the way to support multithreaded programming was to protect
every resource with locks,
as if programs should share class instances
across threads haphazardly.
This trend is reflected in Alex’s story.</p>
<p>Since then, the error prone nature of concurrency and parallelism
has encouraged more disciplined design and tools.
One approach is to organize the program architecture around a few specific threads running
for the duration of the program, with carefully controlled communication
protocols.
Another is to spawn threads only to compute pure functions,
which do not have shared resource problems.</p>
<p>Based on Alex’s comments we can guess that
he would prefer processes to threads.
Processes offer memory protection by default, with all the danger
centralized in small shared portions.
(See chapter 7 of “The Art of UNIX Programming”)</p><a href="#fnref:4" rev="footnote">↩</a></li>
<li id="fn:5">
Although <code>malloc</code> may lock, according to
<a href="https://devblogs.microsoft.com/cppblog/concurrent-containers/">this article</a>,
STL containers on Microsoft platforms do not attempt to
ensure thread safety with locks.<a href="#fnref:5" rev="footnote">↩</a></li>
<li id="fn:6">
<p>A significant difference between Alex’s lists and those
in Lisp is that they are homogeneous,
they can only store one type of value.
In Lisp, heterogeneous lists are everywhere,
especially nested lists, which are what
allow code to be written in a list format.</p>
<p>For example the expression:</p>
<pre><code>(+ (* 1 2) 3)
</code></pre>
<p>Is a valid piece of code in Lisp.
It is also two lists nested together.
The inner list is the symbol <code>*</code> followed
by <code>1</code> and <code>2</code>.
The outer list starts with the symbol <code>+</code>, etc.</p>
<p>The complexity of allocating and managing memory
for such structures was one of the motivations
for inventing garbage collection.</p><a href="#fnref:6" rev="footnote">↩</a></li>
<li id="fn:7">
<code>rplaca</code> and <code>rplacd</code> are unfriendly abbrevations of
“replace car” and “replace cdr” (see “Lisp 1.5 Programmer’s Manual”).
They are low-level functions for manipulating pairs in lists.
In Scheme they correspond to <code>set-car!</code> and <code>set-cdr!</code>.
In Common Lisp one typically uses the higher-level macro <a href="http://www.lispworks.com/documentation/lw50/CLHS/Body/m_setf_.htm"><code>setf</code></a> for the same purpose.<a href="#fnref:7" rev="footnote">↩</a></li>
<li id="fn:8">
The assignment in this code is the same as <code>(setf (cdr x) free-list)</code> in Common Lisp,
or <code>(set-cdr! x free-list)</code> in Scheme.<a href="#fnref:8" rev="footnote">↩</a></li>
<li id="fn:9">
Since we implement <code>pop_front</code>, you might also expect <code>pop_back</code>.
A little thought will reveal there is no constant
time implementation for singly linked lists.
The queue has a reference to the last node in the queue,
but removing it would require modification
of the preceding node.<a href="#fnref:9" rev="footnote">↩</a></li>
</ol>
</div>
<span style="float: right">
[
<a href="07_min_range.html">previous</a>,
<a href="index.html">contents</a>,
<a href="09_iterators.html">next</a>
]
</span>
</body>
</html>