forked from mikeckennedy/talk-python-transcripts
-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy path021_PyPy_The_JIT_Compiled_Python_Implementation.txt
1482 lines (741 loc) · 53.5 KB
/
021_PyPy_The_JIT_Compiled_Python_Implementation.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
00:00:00 Is your Python code running a little slow?
00:00:02 Did you know the PyPy runtime could make it run up to 10 times faster?
00:00:06 Seriously.
00:00:07 Maja Falkowski is here to tell us all about it.
00:00:10 This is episode number 21, recorded Wednesday, July 8th, 2015.
00:00:16 Developers, developers, developers, developers.
00:00:19 I'm a developer in many senses of the word because I make these applications, but I also
00:00:25 use these verbs to make this music.
00:00:27 I construct it line by line, just like when I'm coding another software design.
00:00:31 In both cases, it's about design patterns.
00:00:34 Anyone can get the job done.
00:00:36 It's the execution that matters.
00:00:37 I have many interests.
00:00:39 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the
00:00:45 ecosystem, and the personalities.
00:00:47 This is your host, Michael Kennedy.
00:00:49 Follow me on Twitter, where I'm @mkennedy.
00:00:51 Keep up with the show and listen to past episodes at talkpython.fm.
00:00:56 And follow the show on Twitter via at talkpython.
00:00:59 This episode, we'll be talking with Maja Falkowski about the amazing alternative Python implementation,
00:01:05 PyPy.
00:01:06 This episode is brought to you by Hired and Codeship.
00:01:10 Thank them for supporting the show via Twitter, where they're at hired underscore HQ,
00:01:15 and at codechip.
00:01:16 Before we get to Maja, let me share a little news with you.
00:01:19 First off, Talk Python to Me has a new domain name, talkpython.fm.
00:01:24 I put the idea of a shorter .fm-based domain out on Twitter, and I'd say about 80% of the
00:01:30 listeners said they liked it better than the longer .com domain.
00:01:32 So here you go.
00:01:34 About a month ago, I moved all the MP3 file traffic out of Amazon S3 and into a dedicated
00:01:39 audio file cache server.
00:01:41 It's a lightweight Flask Python 3 app running through Nginx and Microwiskey.
00:01:47 A few listeners expressed interest in seeing the code, so I did a little work to try to generalize
00:01:52 this a bit, and I open sourced it.
00:01:53 I'm calling the project Cachedier.
00:01:55 And you can find a blog post as well as a link to the GitHub project on the show notes.
00:02:01 Next up, we have a new Python podcast.
00:02:03 I'm super happy to announce a Python podcast by Brian Okken called Python Test Podcast.
00:02:11 You can find it at pythontesting.net slash category slash podcast.
00:02:16 Now, let's get on to the show.
00:02:18 Maja, welcome to the show.
00:02:19 Thanks for inviting me.
00:02:20 Yeah, I'm super excited to talk about our topic today, which is PyPy.
00:02:25 And I think what you guys are doing with PyPy is so incredibly cool to be taking some of
00:02:32 these JIT compilation GC sort of semi-compiled languages or concepts and applying them to
00:02:38 Python.
00:02:38 So really happy to talk about that.
00:02:40 The story of compiling dynamic languages is really sort of old and half-forgotten.
00:02:47 Like, we know these days that you can do this with JavaScript, but the original work on small
00:02:54 talk dates back to at least mid-90s, if not earlier, which is what we are all building on
00:03:02 top of anyway.
00:03:03 So it's nothing new.
00:03:05 The new part is just applying this to Python.
00:03:07 That's right.
00:03:09 That's right.
00:03:09 Well, I think it's great.
00:03:11 Maybe before we get into the details of what you guys are doing, maybe you could give the
00:03:16 listeners who are not familiar with PyPy a little history and introduction to it.
00:03:21 So PyPy is essentially a Python interpreter, which works very, very similarly to the normal
00:03:29 thing that you would call Python, that technically is called CPython.
00:03:34 It's a Python interpreter written in C.
00:03:35 And we have a different Python interpreter, which is implemented slightly differently.
00:03:40 And for the most part, glancing over all the details, it should run faster on most of the
00:03:48 examples because it can dynamically compile Python down all the way to the assembler level.
00:03:55 So it's like a normal Python interpreter, except sometimes faster, most times faster, in fact.
00:04:02 That's it.
00:04:03 It sounds very simple, but it's actually quite a big project.
00:04:07 It has been around more or less 10 years by now.
00:04:10 Wow.
00:04:11 It started 10 years ago.
00:04:12 And when did you get involved with it?
00:04:14 I got involved, I think, 2006 or 2007.
00:04:19 I was doing, I sort of got interested in Python static analysis, which PyPy, part of PyPy is doing
00:04:29 that, is taking a restricted subset of Python, which PyPy is implemented in and compiling it
00:04:35 down to the C level.
00:04:36 So I was interested in Python static analysis and I glanced over PyPy project and sort of
00:04:42 started getting involved.
00:04:44 And then I got a spot at Google Summer of Code to work on PyPy for the summer.
00:04:50 And that's essentially how it all started.
00:04:52 How many people work on PyPy or contribute to PyPy?
00:04:55 Depending how you count, it's anything between three and 30.
00:05:00 PyPy is a big umbrella project for a vast variety of anything from, as I said, a Python interpreter
00:05:09 to very researchy stuff that people at various universities try to experiment with.
00:05:15 Like there is a couple of people working on running Python and PHP in the same process.
00:05:22 So you run PHP code in the server, but you can still call Python functions in that process.
00:05:29 There are people working on software transactional memory.
00:05:33 So it's a big umbrella project that is a research vehicle for a lot of people, additionally to
00:05:39 being the Python interpreter.
00:05:40 Yeah, I can see how that would work for if you're doing some sort of academic research,
00:05:45 especially something with JIT and GC, then it makes a lot of sense.
00:05:50 I think one of the things that people either who are new to Python or have kind of dabbled
00:05:54 in it, but are not, you know, deeply working with it and thinking about the internals of
00:06:00 it every day, don't realize that there's actually a whole variety of different interpreters out
00:06:05 there.
00:06:05 There's a bunch.
00:06:07 They're all slightly different.
00:06:10 So let's glance over them because I think it's important to know there's like the CPython is
00:06:17 the normal Python interpreter that is probably used by 99% of people using Python.
00:06:22 Yeah.
00:06:23 If I open up Linux or my Mac and I type the word Python and enter that's CPython, right?
00:06:27 That's CPython.
00:06:28 So that's what most people would use.
00:06:30 CPython internals that you need to know is the fact that it's implemented in C.
00:06:36 And another internal detail that's important to know is that it exposes the C API, which
00:06:43 goes quite low.
00:06:45 So it's possible to write C extensions in C for Python.
00:06:49 So you write a bunch of C code, use a special API for accessing Python objects, and then it
00:06:54 can be called from Python code, your C functions.
00:06:59 Then we have Jiton, which is quite old, actually.
00:07:04 And it's a Python interpreter written in Java and a similar project called Iron Python, which
00:07:11 is a Python interpreter written in C#.
00:07:13 And those two interpreters, they're quite widely used for people who write Java and want a better
00:07:22 language.
00:07:22 So they, so their main big advantage is integration with the underlying platform.
00:07:31 So Jiton is very well integrated with Java and Iron Python with C#.
00:07:35 So if you're writing C#, but you would really love to write some Python, you can do that these
00:07:40 days.
00:07:40 And then there's PyPy, which is another Python interpreter written slightly differently with
00:07:46 a just-in-time compiler.
00:07:48 So those are the four main interpreters.
00:07:50 And there is, there is quite a few projects that try to enter this space, like PyStone, which
00:07:57 is another Python interpreter written by Dropbox people.
00:08:00 Yeah.
00:08:01 I wanted to ask you about PyStone because that's, that seems to me to be somewhat similar to
00:08:07 what you guys are doing.
00:08:08 And, and it comes, the fact that it comes from Dropbox where Guido is and a lot, there's a
00:08:13 lot of sort of gravity for the Python world at Dropbox that made it more interesting to me.
00:08:17 Do you know anything about it or can you speak to how it compares or the goals or anything
00:08:21 like that?
00:08:23 So, well, I know that it's very, very similar to the project that once existed at Google
00:08:28 called Unladen Swallow.
00:08:29 So the main idea is that it's a Python interpreter that contains a just-in-time compiler that uses
00:08:38 LLVM as the underlying assembler platform.
00:08:41 Let's call it that way.
00:08:42 And this is the main goal.
00:08:44 The main goal is to run fast.
00:08:46 Now, the current status is that it doesn't run fast.
00:08:50 That's for sure.
00:08:52 It runs roughly at the same speed as CPython for stuff that I've seen on their website.
00:08:57 As for the future, I don't know.
00:09:01 I really think the future is really hard.
00:09:02 Especially when you don't have much visibility into it, right?
00:09:06 Yeah.
00:09:07 Like, I can tell you that like PyPy, PyPy has a bunch of different problems to PyStone.
00:09:15 So, for example, we consciously choose to not implement the C API at first because the
00:09:23 C API ties you a lot into the CPython model.
00:09:28 We choose not to implement it at first.
00:09:31 We implement it later as a compatibility layer.
00:09:34 So the first problem is that it's quite slow.
00:09:38 It's far, far slower than the one in CPython.
00:09:41 And as far as I know, right now, Dropbox uses the same C API, which gives you a lot of problems,
00:09:48 like a lot of constraints of your design.
00:09:51 But also, like, gives you a huge, huge benefit, which is being able to use the same C modules, which are a huge part of the Python ecosystem.
00:10:00 Yeah, especially some of the really powerful ones that people don't want to live without, things like NumPy and, to a lesser degree, SQLAlchemy, the things that have the C extensions that are really popular as well.
00:10:11 So you guys don't want to miss out on that, right?
00:10:14 Right.
00:10:15 So you brought two interesting examples.
00:10:18 So, for example, NumPy is so tied to the C API that it's very hard to avoid.
00:10:24 It's not just NumPy.
00:10:27 It's the entire ecosystem.
00:10:28 We, in PyPy, we re-implemented most of NumPy, but we are still missing out on the entire ecosystem.
00:10:37 And we have some stories how to approach that problem, but it's a hard problem to tackle, that we choose to make harder by not implementing the C API.
00:10:47 However, for example, the SQLAlchemy stuff.
00:10:50 SQLAlchemy is Python.
00:10:53 It's not C, but it uses the database drivers, which are implemented in C, like a lot of them.
00:11:01 So our answer to that is CFFI, which is a very, very simple way to call C from Python.
00:11:08 And CFFI took off like crazy.
00:11:12 Like, for most things, like database drivers, there's a CFFI-ready replacement that works as well and usually a lot better on PyPy that made it possible to use PyPy in places where you would normally not be able to do that.
00:11:31 And CFFI is like really, really popular.
00:11:35 It gets like over a million downloads a month, which is quite crazy.
00:11:39 And CFFI is not just a PyPy thing.
00:11:42 It also works in CPython, right?
00:11:44 Yeah, it works in CPython in between like 2.6 and 3.something, I think.
00:11:50 3.whatever is the latest.
00:11:51 And it works on both PyPy and PyPy3.
00:11:54 And since it's so simple, it will probably work one day in JITON too.
00:12:01 You said you have a plan for the NumPy story and these other heavy sort of C-based ones.
00:12:07 Currently, the way you support it, this is a question I don't know, is that you've kind of re-implemented a lot of it in Python?
00:12:15 So we, to be precise, we re-implemented a lot of it in our Python.
00:12:22 Our Python is the internal language that we use in PyPy.
00:12:26 Right, that's the restricted Python that you guys actually target, right?
00:12:29 Yes.
00:12:30 Yeah, but we don't, generally don't encourage anybody to use it.
00:12:35 Unless you're writing interpreters, then it's great.
00:12:38 But if you're not writing interpreters, it's an awful language.
00:12:41 But we, so the problem with NumPy is that NumPy ties so closely that we added special support in the JIT for parts of it and things like that, that we decided are important enough that you want to have them implement in the core of PyPy.
00:12:57 So we have, most of NumPy actually works on PyPy.
00:13:01 And this is sometimes not good enough because if you're using NumPy, chances are you're using SciPy, Scikit, Learn, Matplotlib, and all this stuff.
00:13:11 We have some story how to use it, which is to, the simplest thing is just to embed the Python interpreter inside PyPy and call it using CFFI.
00:13:23 It's a great hack.
00:13:24 It works for us.
00:13:25 Really?
00:13:25 You can like fall back to regular Cpython within your PyPy app?
00:13:30 Yeah, it's called PyMetabiosis.
00:13:33 That's awesome.
00:13:34 I'm pretty sure there's at least one video online with the author talking about it.
00:13:44 It works great for the numeric stack, which is its goal.
00:13:49 So this is our story.
00:13:51 We are still raising funds to finish implementing NumPy.
00:13:56 It says a very, very long tale of features.
00:13:58 And once we are done with NumPy, we'll try to improve the story of calling other numeric libraries on top of PyPy to be able to mostly seamlessly be able to use stuff like SciPy and Matplotlib.
00:14:13 It will still take a while.
00:14:15 I'm not even willing to give an estimate.
00:14:17 Sure.
00:14:19 But it's great.
00:14:20 And it does look like there's a lot of support there.
00:14:21 We'll talk about that stuff in a little bit because I definitely want to call attention to that and let people know how they can help out.
00:14:27 Before we get into those kind of details, though, can we talk just briefly about why would I use PyPy or when and why would I use PyPy over, say, CPython or Jython?
00:14:39 Like, what do you guys excel at?
00:14:41 When should a person out there is thinking, like, they've just realized, oh, my gosh, there's more than one interpreter?
00:14:46 How do I choose?
00:14:48 Like, can you help give some guidance around that?
00:14:49 So typically, if you just discovered, oh, there's more than one interpreter, you just want to use CPython.
00:14:55 That's like the simplest answer.
00:14:57 You want to use CPython, but if you're writing an open source library, you want to support PyPy at least, which is what most people are doing.
00:15:04 They're using CPython and the libraries support PyPy for the most part.
00:15:08 Our typical user, and this is a very terrible description, but this is our typical user.
00:15:14 This episode is brought to you by Hired.
00:15:27 Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.
00:15:33 Each offer you receive has salary and equity presented right up front, and you can view the offers to accept or reject them before you even talk to the company.
00:15:43 Typically, candidates receive five or more offers in just the first week, and there are no obligations, ever.
00:15:49 Sounds pretty awesome, doesn't it?
00:15:51 Well, did I mention there's a signing bonus?
00:15:53 Everyone who accepts a job from Hired gets a $2,000 signing bonus, and as Talk Python listeners, it gets way sweeter.
00:16:01 Use the link Hired.com slash Talk Python To Me, and Hired will double the signing bonus to $4,000.
00:16:10 Opportunity's knocking.
00:16:11 Visit Hired.com slash Talk Python To Me and answer the call.
00:16:14 You have a large Python application that's spanning servers, serving millions of users,
00:16:30 and you're running into corners.
00:16:33 Like, you can't serve requests quickly enough.
00:16:37 You can't serve enough users from machine.
00:16:40 You're running into problems.
00:16:41 Now, your application is too big to, say, rewrite it in C or Go, or it's just, like, too scary for whatever reason.
00:16:50 So, you look, like, what it would take to run stuff in PyPy.
00:16:54 It usually takes, like, a bit of, your code should run, but it usually takes a bit of effort to, like, see what sort of libraries do you use.
00:17:06 Do you use NSE extensions?
00:17:07 If their C extensions are, like, crucial, can you replace them with something?
00:17:11 So, yeah, this is our typical user.
00:17:13 And I have people, I run a consulting company that does that.
00:17:18 There are people coming and asking, like, okay, I have this set up.
00:17:22 It's impossible to do anything with it now.
00:17:25 Can I just, like, swap the interpreters, make it run faster, and make the problems go away?
00:17:30 This is our typical user.
00:17:33 I hear why you described it that way is maybe not the best way, but, you know, you're right.
00:17:38 If you have 100,000, half a million lines of Python, and really you just need to make it a little faster.
00:17:44 If switching to a different interpreter like PyPy will solve that, that's great.
00:17:49 So, speaking of faster, can you talk about the performance comparisons?
00:17:53 I have a little example I'll tell you, but I'll let you go first.
00:17:57 So, as usual, performance comparisons are usually very hard to do and flawed.
00:18:04 Everybody, yes, absolutely.
00:18:05 Everybody's thing they care about is not exactly what you're measuring, and so it might be totally misleading.
00:18:10 But give it a shot.
00:18:12 One good estimate is if you don't have benchmarks, you don't care about performance.
00:18:17 Like, if you never wrote benchmarks for your applications, then chances are you don't actually care all that much.
00:18:24 And you shouldn't really...
00:18:27 That's the first step.
00:18:28 Like, make sure you know how fast your applications run.
00:18:32 Once you know that, you can measure it on different interpreters.
00:18:34 But as far as expectations go, PyPy tends to run heavy computations a lot faster.
00:18:41 Like, a lot is anything between 10 and 100 times faster, depending on the workload.
00:18:49 For stuff that's more...
00:18:50 And again, what is a typical Python program?
00:18:54 Typical Python program is probably Hello World.
00:18:56 How fast Python runs Hello World.
00:18:58 Roughly at the same speed as CPython, you won't notice.
00:19:01 But for a typical web application, the speed up, if you're not heavily relying on C extensions, would be around 2x.
00:19:09 So, 2x faster for a lot of people makes a lot of difference.
00:19:13 Absolutely.
00:19:14 It also depends on where are you waiting.
00:19:16 Like you said, you should profile it and figure this out.
00:19:18 If your Python web app is slow because 80% of the time you're waiting on the database, well, it doesn't really matter how fast your Python code is.
00:19:26 Your database is a problem.
00:19:27 Or something like this, right?
00:19:29 Exactly.
00:19:30 Exactly.
00:19:30 And like, the thing is like, so let's narrow it down to, say, web applications.
00:19:36 Like, okay, let me first talk about other stuff and then let's go to web applications.
00:19:40 Like, where people found Piper incredibly useful is things like high-frequency trading.
00:19:46 Like, not the very crazy high-frequency where you have to make decisions like multiple times per millisecond.
00:19:52 But like the sort of frequency where you want to make decisions within a few milliseconds.
00:19:58 And then those decisions are like tens of milliseconds.
00:20:02 Those decisions can, then you want to be able to modify your algorithms fast, which is a lot easier on Python than, say, on C++.
00:20:10 And you're running into less problems with how to shoot yourself in the foot and segfault all your trading.
00:20:16 So, that's when people tend to use Piper because like, in this sort of scenario, it would be like 10 times faster.
00:20:24 So, super low latency stuff where 10 milliseconds makes a huge difference to you.
00:20:28 Something like that.
00:20:29 Yeah.
00:20:29 Okay.
00:20:30 Another example is there's, for example, a project called MyHDL, which is the hardware emulation layer.
00:20:40 And these tend to emit sort of low-level Python code that just do computations to emulate hardware.
00:20:48 And then again, on Piper, it's like over 10 times faster.
00:20:51 So, those are the very good examples.
00:20:53 The very bad examples, as you said.
00:20:54 If your program, if your staff is waiting on the database, then you're out of luck.
00:21:00 Like, no matter how fast your interpreter responds.
00:21:02 But yeah.
00:21:05 On the typical web server load, even if there is such a thing, it would be around two times speed up.
00:21:11 Sometimes more, sometimes less.
00:21:13 Depending on the setup, really.
00:21:15 But as I said, you should really measure yourself.
00:21:18 The things where Python is quite better, if you spend most of the time in C extensions,
00:21:26 then it's either not helping or actually prevent you from doing so.
00:21:29 And the second time where it's not that great is when the program is short running.
00:21:36 So, because it's just-in-time compilation, it means that each time you run your program,
00:21:41 the interpreter has to look what's going on, pick things to compile to Assembler, compile them to Assembler,
00:21:49 and that all takes time.
00:21:50 Right.
00:21:50 There's a little more initial startup when that happens.
00:21:54 Yeah, the warm-up time is usually quite bad.
00:21:57 Well, I like to think that warm-up time of PyPy is quite bad.
00:22:01 And then I look at Java, when it's absolutely outrageous.
00:22:04 It's a relative statement.
00:22:07 It's a relative term.
00:22:08 Like, compared to CPython, PyPy time is really terrible.
00:22:11 And compared to Luach, it's, again, the warm-up time is terrible.
00:22:14 But compared to Java, it's not that bad.
00:22:17 So, yeah, it really depends on your setup.
00:22:20 And it's typically important for long-running applications.
00:22:23 Then again, this is a typical PyPy user.
00:22:26 When stuff like server-based applications where your programs run for a long time.
00:22:32 Right.
00:22:33 You start it up and it's going to serve a million requests an hour until it gets recycled or something, yeah?
00:22:38 Something like that.
00:22:40 I mean, these days, even JavaScript is long-running up.
00:22:44 Like, how long do you keep your Gmail open?
00:22:46 For usually, for longer than a few seconds.
00:22:49 Yeah, that's for sure.
00:22:51 So, let's talk a little bit about the internals.
00:22:55 Could you describe just a little bit of...
00:23:00 So, if I take a Python script and it's got some classes and some functions and they're calling each other and so on.
00:23:05 What does it look like in terms of what's happening when that code runs?
00:23:09 Okay.
00:23:11 So, I'll maybe start from, like, how PyPy is built and then get back to your question directly.
00:23:16 Yeah, great.
00:23:17 So, PyPy is two things.
00:23:19 And it has been very confusing because we've been calling them PyPy and PyPy.
00:23:24 And calling two things which are related but not identical the same name is absolutely terrible.
00:23:30 We'll probably fix that at some point.
00:23:32 But, like, PyPy is mostly two things.
00:23:35 So, one thing is a Python interpreter.
00:23:38 And the other thing is a part that I would call RPython, which is a language for writing interpreters.
00:23:44 It tends to be similar to Python in a sense that it's a restricted subset of Python.
00:23:51 But this is largely irrelevant for the architectural question.
00:23:54 So, you have an interpreter written in RPython that can be PyPy.
00:24:01 We have a whole variety.
00:24:03 There's Hippie, which is a PHP interpreter.
00:24:05 There's a bunch of Scheme interpreters.
00:24:08 And there's even a Prolog interpreter and a whole bunch of other interpreters written in RPython.
00:24:14 And then...
00:24:14 Is RPython a compiled language?
00:24:17 Yes.
00:24:18 And the other part is essentially the translation toolchain or a compiler for RPython.
00:24:25 So, it contains various things like garbage collector implementation for RPython,
00:24:31 the data types like strings, unicodes, and all the things that RPython supports.
00:24:36 It also contains a just-in-time compiler for RPython and for interpreters written in RPython,
00:24:43 which is one level in direction compared to what you usually do.
00:24:49 So, the just-in-time compiler would be sort of generated from your RPython interpreter and not implemented directly,
00:24:59 which is very, very important for us because Python, despite looking simple,
00:25:03 is actually an incredibly complicated language.
00:25:06 If you're trying to encode all the descriptor protocol or how actually functions and parameters are called,
00:25:11 chances are you'll make a mistake.
00:25:13 So, if you're implementing an interpreter and a just-in-time compiler, it's very, very hard to get all the details right.
00:25:19 So, we implement the Python semantics once in the Python interpreter, and then it gets either directly executed or compiled to assembly.
00:25:32 So, if you're coming back to your question, if you have a Python program,
00:25:37 first, what it does, it will compile to bytecode, and bytecode is quite high level.
00:25:42 There's a thing called this module, which you can just call this.this on any sort of Python object,
00:25:51 and it will display bytecode.
00:25:54 And the basic idea, which is what CPython does, and which is what PyPy does too at first,
00:26:00 is to take bytecodes one by one, look what's it, and then execute it.
00:26:06 Yeah.
00:26:08 And is that like what's in the PyCache folders and things like that?
00:26:11 Like those PYC files?
00:26:13 Yeah.
00:26:13 The PYC files are essentially a serialized version of Python bytecode.
00:26:17 Okay.
00:26:18 It's just a cache to store to not have to parse Python files each time you import a giant project.
00:26:25 Right.
00:26:25 Okay.
00:26:26 And so then CPython takes those instructions and executes them via an interpreter,
00:26:30 but that's not what happens on PyPy, right?
00:26:32 That's what happens on PyPy initially.
00:26:35 So, all your code will be like executed like CPython, except if you hit a magic number of like function calls
00:26:43 or loop iterations, I think it's 1037 for loop iterations, then you compile this particular loop,
00:26:53 in fact, this particular execution of a loop, into assembler code.
00:26:56 Then if you have a mix of interpreter code and assembler code, and if you,
00:27:04 the assembler code is a linear sequence of instructions that contains so-called guards.
00:27:11 So, the guards will be anything from if something in the Python source to is the type of this thing stays the same.
00:27:19 Then if you happen to fail those guards, then you, okay, I failed this guard,
00:27:25 I'm going to go and start compiling assembler again.
00:27:29 I mean, at first you jump back to the interpreter, but if you, again, hit a magic number,
00:27:34 you compile the assembler again from this guard.
00:27:36 And then you end up with like a tree of execution that resembles both your Python code
00:27:43 and the type structure that you're passing in a few other things that are automatically determined.
00:27:48 So, at the end of the day, you end up with a Python function or like multiple Python functions
00:27:54 that got compiled to assembler if you warm stuff for long enough.
00:27:57 Okay.
00:27:58 That's, that is super interesting.
00:27:59 I didn't expect that it would have this initial non-assembled assembler version.
00:28:05 That's, that's very cool.
00:28:06 What was, do you know what the thinking around that was?
00:28:08 Is it just better performance?
00:28:09 So, there's a variety of things.
00:28:12 Like, one thing is that if you try to, to compile everything like upfront,
00:28:17 it would take you forever.
00:28:19 But also you are, you can do some optimizations.
00:28:24 Like, a lot of optimizations done in PyPy are sort of optimistic.
00:28:28 Like, we're going to assume special things like sys.setTrace or sys.getFrame
00:28:35 just does not happen.
00:28:37 And until it doesn't happen, things can run nicely and smoothly.
00:28:41 But you're trying to figure out on the fly what's going on.
00:28:45 And then you compile pieces that you know about.
00:28:47 So, at the moment when you are compiling a Python loop or a function or something like that,
00:28:53 you tend to know more about the, the state of execution than, that is just in the source.
00:28:58 Like, you tend to know the types, the precise shape of objects.
00:29:02 Like, is this an object that's class X and has two attributes A and B?
00:29:07 Or is it an object of class X that has three attributes A, B, and C?
00:29:11 And those decisions can lead to better performance, essentially.
00:29:15 So, on your website, you say that this, that PyPy may be better in terms of memory usage as well.
00:29:22 How does that work?
00:29:23 It's a trade-off, right?
00:29:25 So, first of all, PyPy does consume memory memory for the compound assembler
00:29:32 and the associated bookkeeping data.
00:29:34 That depends on how much code you actually run.
00:29:38 But, the object representation of Python, of Python objects is more compact
00:29:43 than PyPy.
00:29:43 So, the actual amount of memory consumed by your heap tends to be smaller.
00:29:50 Like, all PyPy objects are as memory compact as see Python objects using
00:29:56 slots.
00:29:58 Right, okay.
00:29:58 So, it's the same optimization except it's transparent.
00:30:01 Then, the, like, list of only integers would not allocate the entire objects.
00:30:10 It would allocate only small integers.
00:30:12 Then, the, the objects are smaller themselves because we use a different garbage collection
00:30:18 strategy.
00:30:18 It's not ref counting.
00:30:20 it's a garbage collector.
00:30:21 Right, so, let's talk about the garbage collector just for a moment.
00:30:24 Is it a mark and sweep garbage collector?
00:30:27 This episode is brought to you by CodeShip.
00:30:43 CodeShip has launched organizations, create teams, set permissions for specific team members,
00:30:49 and improve collaboration in your continuous delivery workflow.
00:30:52 Maintain centralized control over your organization's projects and teams
00:30:56 with CodeShip's new organizations plan.
00:30:58 And, as Talk Python listeners, you can save 20% off any premium plan for the next three months.
00:31:03 Just use the code TALKPYTHON, all caps, no spaces.
00:31:07 Check them out at CodeShip.com and tell them thanks for supporting the show
00:31:11 on Twitter where they're at, CodeShip.
00:31:13 It's in, very convoluted variant of mark and sweep.
00:31:21 Yeah.
00:31:21 It has two generations of objects, young objects and old objects, and old objects
00:31:27 are mark and sweep, and young objects are pointer bump allocations.
00:31:31 So, the net effect is that if you are having a lot of small objects that get allocated
00:31:39 all the time and forgotten really quickly, allocation takes, like, on average,
00:31:43 around one CPU instruction.
00:31:45 It's, on average, one, because it takes, like, slightly more, but then you have
00:31:51 pipelining, so sometimes it takes slightly less.
00:31:53 Okay, do you guys do compaction and things like that as well?
00:31:57 No, but we do copy old objects from the young generation to the old generation.
00:32:04 Then we don't compact the old generation, but usually more compact than your normal setup
00:32:10 where you have lots of objects that are scattered all over the place because you only
00:32:14 have to deal with objects that survive minor collection.
00:32:17 Right, and that's the majority of objects that we interact with all die right away.
00:32:22 Vast majority.
00:32:22 Yeah, absolutely.
00:32:23 For the most part.
00:32:25 Okay, yeah, that's very cool.
00:32:27 One of the things that is not super easy in regular Python is parallelism
00:32:33 and asynchronous programming and so on.
00:32:35 And you guys have this thing called stackless mode.
00:32:39 What's the story with that?
00:32:40 It's the same thing as stackless Python.
00:32:44 It gives you an ability to have coroutines that can be swapped out without an explicit
00:32:50 yield keyword.
00:32:51 So it's not like Python 3 coroutines.
00:32:54 it's like normal coroutines when you can swap them randomly.
00:32:59 For example, GEvent uses I think GEvent uses stackless mode for swapping
00:33:05 the coroutines.
00:33:06 Okay, so you said that you can get better concurrency.
00:33:10 Can you kind of describe speak to that any or what are your thoughts there?
00:33:14 I personally don't use stackless all that much but the net effect is that you
00:33:20 you can write code like with Python 3 coroutines without the yield keyword.
00:33:27 So you just call function then you can swap the functions for other things.
00:33:31 It's a bit like implicit twisted where you don't get better concurrency than twisted
00:33:37 but you're not you don't need to write your programs in the style that twisted requires.
00:33:43 I was going to say it's just a little more automatic and you don't have to be so explicit
00:33:48 that you're doing threading.
00:33:49 Yeah, exactly.
00:33:51 Like the normal normal threads especially in Python where you have the global interpreter log
00:33:57 they don't scale all that well and like the solution is usually twisted but twisted requires
00:34:02 you to have all the libraries and everything written twisted aware which stackless
00:34:08 does not generally requires.
00:34:09 I don't have any particular feelings towards all of that to be honest.
00:34:15 Sure.
00:34:16 Does it also support Twisted running on PyPy?
00:34:18 Do you know?
00:34:19 Yeah, obviously.
00:34:20 Twisted is a Python program.
00:34:21 We had from the very early days we had good contact with twisted people and people who use twisted
00:34:30 tend to be from the same category as people who use PyPy.
00:34:32 People who have large running code bases that are boring but have problems