forked from mikeckennedy/talk-python-transcripts
-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy path021_PyPy_The_JIT_Compiled_Python_Implementation.vtt
9224 lines (6149 loc) · 139 KB
/
021_PyPy_The_JIT_Compiled_Python_Implementation.vtt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
WEBVTT
00:00:00.001 --> 00:00:02.220
Is your Python code running a little slow?
00:00:02.220 --> 00:00:06.540
Did you know the PyPy runtime could make it run up to 10 times faster?
00:00:06.540 --> 00:00:07.420
Seriously.
00:00:07.420 --> 00:00:10.660
Maja Falkowski is here to tell us all about it.
00:00:10.660 --> 00:00:16.120
This is episode number 21, recorded Wednesday, July 8th, 2015.
00:00:16.120 --> 00:00:19.700
Developers, developers, developers, developers.
00:00:19.700 --> 00:00:25.000
I'm a developer in many senses of the word because I make these applications, but I also
00:00:25.000 --> 00:00:27.220
use these verbs to make this music.
00:00:27.220 --> 00:00:31.760
I construct it line by line, just like when I'm coding another software design.
00:00:31.760 --> 00:00:34.980
In both cases, it's about design patterns.
00:00:34.980 --> 00:00:36.480
Anyone can get the job done.
00:00:36.480 --> 00:00:37.980
It's the execution that matters.
00:00:37.980 --> 00:00:39.460
I have many interests.
00:00:39.460 --> 00:00:45.640
Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the
00:00:45.640 --> 00:00:47.240
ecosystem, and the personalities.
00:00:47.240 --> 00:00:49.280
This is your host, Michael Kennedy.
00:00:49.280 --> 00:00:51.720
Follow me on Twitter, where I'm @mkennedy.
00:00:51.720 --> 00:00:56.360
Keep up with the show and listen to past episodes at talkpython.fm.
00:00:56.360 --> 00:00:59.580
And follow the show on Twitter via at talkpython.
00:00:59.580 --> 00:01:05.760
This episode, we'll be talking with Maja Falkowski about the amazing alternative Python implementation,
00:01:05.760 --> 00:01:06.540
PyPy.
00:01:06.540 --> 00:01:10.900
This episode is brought to you by Hired and Codeship.
00:01:10.900 --> 00:01:15.060
Thank them for supporting the show via Twitter, where they're at hired underscore HQ,
00:01:15.160 --> 00:01:16.320
and at codechip.
00:01:16.320 --> 00:01:19.580
Before we get to Maja, let me share a little news with you.
00:01:19.580 --> 00:01:24.720
First off, Talk Python to Me has a new domain name, talkpython.fm.
00:01:24.720 --> 00:01:30.020
I put the idea of a shorter .fm-based domain out on Twitter, and I'd say about 80% of the
00:01:30.020 --> 00:01:32.600
listeners said they liked it better than the longer .com domain.
00:01:32.600 --> 00:01:33.920
So here you go.
00:01:34.680 --> 00:01:39.900
About a month ago, I moved all the MP3 file traffic out of Amazon S3 and into a dedicated
00:01:39.900 --> 00:01:41.860
audio file cache server.
00:01:41.860 --> 00:01:46.740
It's a lightweight Flask Python 3 app running through Nginx and Microwiskey.
00:01:47.240 --> 00:01:52.120
A few listeners expressed interest in seeing the code, so I did a little work to try to generalize
00:01:52.120 --> 00:01:53.800
this a bit, and I open sourced it.
00:01:53.800 --> 00:01:55.500
I'm calling the project Cachedier.
00:01:55.500 --> 00:02:00.160
And you can find a blog post as well as a link to the GitHub project on the show notes.
00:02:01.040 --> 00:02:03.640
Next up, we have a new Python podcast.
00:02:03.640 --> 00:02:11.000
I'm super happy to announce a Python podcast by Brian Okken called Python Test Podcast.
00:02:11.000 --> 00:02:16.000
You can find it at pythontesting.net slash category slash podcast.
00:02:16.000 --> 00:02:18.060
Now, let's get on to the show.
00:02:18.060 --> 00:02:19.340
Maja, welcome to the show.
00:02:19.340 --> 00:02:20.800
Thanks for inviting me.
00:02:20.800 --> 00:02:24.960
Yeah, I'm super excited to talk about our topic today, which is PyPy.
00:02:25.640 --> 00:02:32.060
And I think what you guys are doing with PyPy is so incredibly cool to be taking some of
00:02:32.060 --> 00:02:38.440
these JIT compilation GC sort of semi-compiled languages or concepts and applying them to
00:02:38.440 --> 00:02:38.760
Python.
00:02:38.760 --> 00:02:40.880
So really happy to talk about that.
00:02:40.880 --> 00:02:47.560
The story of compiling dynamic languages is really sort of old and half-forgotten.
00:02:47.560 --> 00:02:54.520
Like, we know these days that you can do this with JavaScript, but the original work on small
00:02:54.520 --> 00:03:02.440
talk dates back to at least mid-90s, if not earlier, which is what we are all building on
00:03:02.440 --> 00:03:03.420
top of anyway.
00:03:03.420 --> 00:03:05.580
So it's nothing new.
00:03:05.580 --> 00:03:07.840
The new part is just applying this to Python.
00:03:07.840 --> 00:03:09.040
That's right.
00:03:09.040 --> 00:03:09.580
That's right.
00:03:09.580 --> 00:03:11.480
Well, I think it's great.
00:03:11.480 --> 00:03:16.020
Maybe before we get into the details of what you guys are doing, maybe you could give the
00:03:16.020 --> 00:03:20.360
listeners who are not familiar with PyPy a little history and introduction to it.
00:03:21.080 --> 00:03:29.420
So PyPy is essentially a Python interpreter, which works very, very similarly to the normal
00:03:29.420 --> 00:03:34.080
thing that you would call Python, that technically is called CPython.
00:03:34.080 --> 00:03:35.840
It's a Python interpreter written in C.
00:03:35.840 --> 00:03:40.560
And we have a different Python interpreter, which is implemented slightly differently.
00:03:40.940 --> 00:03:48.380
And for the most part, glancing over all the details, it should run faster on most of the
00:03:48.380 --> 00:03:55.040
examples because it can dynamically compile Python down all the way to the assembler level.
00:03:55.040 --> 00:04:02.320
So it's like a normal Python interpreter, except sometimes faster, most times faster, in fact.
00:04:02.320 --> 00:04:03.700
That's it.
00:04:03.780 --> 00:04:07.360
It sounds very simple, but it's actually quite a big project.
00:04:07.360 --> 00:04:10.780
It has been around more or less 10 years by now.
00:04:10.780 --> 00:04:11.440
Wow.
00:04:11.440 --> 00:04:12.540
It started 10 years ago.
00:04:12.540 --> 00:04:14.040
And when did you get involved with it?
00:04:14.040 --> 00:04:18.080
I got involved, I think, 2006 or 2007.
00:04:19.020 --> 00:04:29.660
I was doing, I sort of got interested in Python static analysis, which PyPy, part of PyPy is doing
00:04:29.660 --> 00:04:35.080
that, is taking a restricted subset of Python, which PyPy is implemented in and compiling it
00:04:35.080 --> 00:04:35.920
down to the C level.
00:04:36.120 --> 00:04:42.700
So I was interested in Python static analysis and I glanced over PyPy project and sort of
00:04:42.700 --> 00:04:44.640
started getting involved.
00:04:44.640 --> 00:04:50.240
And then I got a spot at Google Summer of Code to work on PyPy for the summer.
00:04:50.240 --> 00:04:52.160
And that's essentially how it all started.
00:04:52.160 --> 00:04:55.700
How many people work on PyPy or contribute to PyPy?
00:04:55.700 --> 00:05:00.140
Depending how you count, it's anything between three and 30.
00:05:00.760 --> 00:05:09.420
PyPy is a big umbrella project for a vast variety of anything from, as I said, a Python interpreter
00:05:09.420 --> 00:05:15.940
to very researchy stuff that people at various universities try to experiment with.
00:05:15.940 --> 00:05:22.600
Like there is a couple of people working on running Python and PHP in the same process.
00:05:22.600 --> 00:05:29.720
So you run PHP code in the server, but you can still call Python functions in that process.
00:05:29.720 --> 00:05:33.400
There are people working on software transactional memory.
00:05:33.400 --> 00:05:39.580
So it's a big umbrella project that is a research vehicle for a lot of people, additionally to
00:05:39.580 --> 00:05:40.500
being the Python interpreter.
00:05:40.500 --> 00:05:45.320
Yeah, I can see how that would work for if you're doing some sort of academic research,
00:05:45.320 --> 00:05:48.500
especially something with JIT and GC, then it makes a lot of sense.
00:05:50.000 --> 00:05:54.860
I think one of the things that people either who are new to Python or have kind of dabbled
00:05:54.860 --> 00:06:00.300
in it, but are not, you know, deeply working with it and thinking about the internals of
00:06:00.300 --> 00:06:05.240
it every day, don't realize that there's actually a whole variety of different interpreters out
00:06:05.240 --> 00:06:05.540
there.
00:06:05.540 --> 00:06:06.960
There's a bunch.
00:06:07.360 --> 00:06:10.440
They're all slightly different.
00:06:10.440 --> 00:06:17.340
So let's glance over them because I think it's important to know there's like the CPython is
00:06:17.340 --> 00:06:22.620
the normal Python interpreter that is probably used by 99% of people using Python.
00:06:22.620 --> 00:06:23.300
Yeah.
00:06:23.300 --> 00:06:27.560
If I open up Linux or my Mac and I type the word Python and enter that's CPython, right?
00:06:27.560 --> 00:06:28.540
That's CPython.
00:06:28.540 --> 00:06:30.620
So that's what most people would use.
00:06:30.760 --> 00:06:36.280
CPython internals that you need to know is the fact that it's implemented in C.
00:06:36.280 --> 00:06:43.820
And another internal detail that's important to know is that it exposes the C API, which
00:06:43.820 --> 00:06:45.080
goes quite low.
00:06:45.080 --> 00:06:49.320
So it's possible to write C extensions in C for Python.
00:06:49.320 --> 00:06:54.440
So you write a bunch of C code, use a special API for accessing Python objects, and then it
00:06:54.440 --> 00:06:57.420
can be called from Python code, your C functions.
00:06:59.000 --> 00:07:04.340
Then we have Jiton, which is quite old, actually.
00:07:04.340 --> 00:07:11.900
And it's a Python interpreter written in Java and a similar project called Iron Python, which
00:07:11.900 --> 00:07:13.780
is a Python interpreter written in C#.
00:07:13.780 --> 00:07:22.380
And those two interpreters, they're quite widely used for people who write Java and want a better
00:07:22.380 --> 00:07:22.880
language.
00:07:22.880 --> 00:07:31.300
So they, so their main big advantage is integration with the underlying platform.
00:07:31.300 --> 00:07:35.600
So Jiton is very well integrated with Java and Iron Python with C#.
00:07:35.600 --> 00:07:40.440
So if you're writing C#, but you would really love to write some Python, you can do that these
00:07:40.440 --> 00:07:40.780
days.
00:07:40.780 --> 00:07:46.880
And then there's PyPy, which is another Python interpreter written slightly differently with
00:07:46.880 --> 00:07:47.960
a just-in-time compiler.
00:07:48.280 --> 00:07:50.760
So those are the four main interpreters.
00:07:50.760 --> 00:07:57.340
And there is, there is quite a few projects that try to enter this space, like PyStone, which
00:07:57.340 --> 00:08:00.740
is another Python interpreter written by Dropbox people.
00:08:00.980 --> 00:08:01.400
Yeah.
00:08:01.400 --> 00:08:07.620
I wanted to ask you about PyStone because that's, that seems to me to be somewhat similar to
00:08:07.620 --> 00:08:08.700
what you guys are doing.
00:08:08.700 --> 00:08:13.080
And, and it comes, the fact that it comes from Dropbox where Guido is and a lot, there's a
00:08:13.080 --> 00:08:17.960
lot of sort of gravity for the Python world at Dropbox that made it more interesting to me.
00:08:17.960 --> 00:08:21.580
Do you know anything about it or can you speak to how it compares or the goals or anything
00:08:21.580 --> 00:08:22.020
like that?
00:08:23.020 --> 00:08:28.480
So, well, I know that it's very, very similar to the project that once existed at Google
00:08:28.480 --> 00:08:29.680
called Unladen Swallow.
00:08:29.680 --> 00:08:38.200
So the main idea is that it's a Python interpreter that contains a just-in-time compiler that uses
00:08:38.200 --> 00:08:41.680
LLVM as the underlying assembler platform.
00:08:41.680 --> 00:08:42.860
Let's call it that way.
00:08:42.860 --> 00:08:44.800
And this is the main goal.
00:08:44.800 --> 00:08:46.280
The main goal is to run fast.
00:08:46.280 --> 00:08:50.960
Now, the current status is that it doesn't run fast.
00:08:50.960 --> 00:08:52.140
That's for sure.
00:08:52.140 --> 00:08:57.160
It runs roughly at the same speed as CPython for stuff that I've seen on their website.
00:08:57.160 --> 00:09:01.060
As for the future, I don't know.
00:09:01.060 --> 00:09:02.500
I really think the future is really hard.
00:09:02.500 --> 00:09:06.620
Especially when you don't have much visibility into it, right?
00:09:06.620 --> 00:09:07.660
Yeah.
00:09:07.660 --> 00:09:15.120
Like, I can tell you that like PyPy, PyPy has a bunch of different problems to PyStone.
00:09:15.120 --> 00:09:23.940
So, for example, we consciously choose to not implement the C API at first because the
00:09:23.940 --> 00:09:28.080
C API ties you a lot into the CPython model.
00:09:28.080 --> 00:09:31.040
We choose not to implement it at first.
00:09:31.040 --> 00:09:34.400
We implement it later as a compatibility layer.
00:09:34.400 --> 00:09:38.580
So the first problem is that it's quite slow.
00:09:38.580 --> 00:09:41.820
It's far, far slower than the one in CPython.
00:09:41.820 --> 00:09:48.040
And as far as I know, right now, Dropbox uses the same C API, which gives you a lot of problems,
00:09:48.040 --> 00:09:49.920
like a lot of constraints of your design.
00:09:51.460 --> 00:09:59.480
But also, like, gives you a huge, huge benefit, which is being able to use the same C modules, which are a huge part of the Python ecosystem.
00:10:00.280 --> 00:10:11.660
Yeah, especially some of the really powerful ones that people don't want to live without, things like NumPy and, to a lesser degree, SQLAlchemy, the things that have the C extensions that are really popular as well.
00:10:11.660 --> 00:10:13.500
So you guys don't want to miss out on that, right?
00:10:14.800 --> 00:10:15.160
Right.
00:10:15.160 --> 00:10:18.340
So you brought two interesting examples.
00:10:18.340 --> 00:10:24.800
So, for example, NumPy is so tied to the C API that it's very hard to avoid.
00:10:24.800 --> 00:10:27.020
It's not just NumPy.
00:10:27.020 --> 00:10:28.700
It's the entire ecosystem.
00:10:28.700 --> 00:10:37.480
We, in PyPy, we re-implemented most of NumPy, but we are still missing out on the entire ecosystem.
00:10:37.480 --> 00:10:47.140
And we have some stories how to approach that problem, but it's a hard problem to tackle, that we choose to make harder by not implementing the C API.
00:10:47.140 --> 00:10:50.820
However, for example, the SQLAlchemy stuff.
00:10:50.820 --> 00:10:53.140
SQLAlchemy is Python.
00:10:53.140 --> 00:11:00.820
It's not C, but it uses the database drivers, which are implemented in C, like a lot of them.
00:11:01.980 --> 00:11:08.380
So our answer to that is CFFI, which is a very, very simple way to call C from Python.
00:11:08.380 --> 00:11:12.280
And CFFI took off like crazy.
00:11:12.280 --> 00:11:31.540
Like, for most things, like database drivers, there's a CFFI-ready replacement that works as well and usually a lot better on PyPy that made it possible to use PyPy in places where you would normally not be able to do that.
00:11:31.540 --> 00:11:35.280
And CFFI is like really, really popular.
00:11:35.280 --> 00:11:39.200
It gets like over a million downloads a month, which is quite crazy.
00:11:39.200 --> 00:11:42.360
And CFFI is not just a PyPy thing.
00:11:42.360 --> 00:11:44.000
It also works in CPython, right?
00:11:44.000 --> 00:11:50.260
Yeah, it works in CPython in between like 2.6 and 3.something, I think.
00:11:50.260 --> 00:11:51.880
3.whatever is the latest.
00:11:51.880 --> 00:11:54.760
And it works on both PyPy and PyPy3.
00:11:54.760 --> 00:12:00.900
And since it's so simple, it will probably work one day in JITON too.
00:12:01.560 --> 00:12:07.720
You said you have a plan for the NumPy story and these other heavy sort of C-based ones.
00:12:07.720 --> 00:12:15.340
Currently, the way you support it, this is a question I don't know, is that you've kind of re-implemented a lot of it in Python?
00:12:15.340 --> 00:12:22.840
So we, to be precise, we re-implemented a lot of it in our Python.
00:12:22.840 --> 00:12:25.960
Our Python is the internal language that we use in PyPy.
00:12:26.220 --> 00:12:29.500
Right, that's the restricted Python that you guys actually target, right?
00:12:29.500 --> 00:12:30.020
Yes.
00:12:30.020 --> 00:12:35.140
Yeah, but we don't, generally don't encourage anybody to use it.
00:12:35.140 --> 00:12:38.180
Unless you're writing interpreters, then it's great.
00:12:38.180 --> 00:12:40.340
But if you're not writing interpreters, it's an awful language.
00:12:41.040 --> 00:12:56.020
But we, so the problem with NumPy is that NumPy ties so closely that we added special support in the JIT for parts of it and things like that, that we decided are important enough that you want to have them implement in the core of PyPy.
00:12:57.080 --> 00:13:01.540
So we have, most of NumPy actually works on PyPy.
00:13:01.540 --> 00:13:11.400
And this is sometimes not good enough because if you're using NumPy, chances are you're using SciPy, Scikit, Learn, Matplotlib, and all this stuff.
00:13:11.400 --> 00:13:23.140
We have some story how to use it, which is to, the simplest thing is just to embed the Python interpreter inside PyPy and call it using CFFI.
00:13:23.140 --> 00:13:24.260
It's a great hack.
00:13:24.260 --> 00:13:25.220
It works for us.
00:13:25.420 --> 00:13:25.900
Really?
00:13:25.900 --> 00:13:30.560
You can like fall back to regular Cpython within your PyPy app?
00:13:30.560 --> 00:13:33.280
Yeah, it's called PyMetabiosis.
00:13:33.280 --> 00:13:34.380
That's awesome.
00:13:34.380 --> 00:13:44.460
I'm pretty sure there's at least one video online with the author talking about it.
00:13:44.460 --> 00:13:49.180
It works great for the numeric stack, which is its goal.
00:13:49.180 --> 00:13:51.060
So this is our story.
00:13:51.060 --> 00:13:56.240
We are still raising funds to finish implementing NumPy.
00:13:56.240 --> 00:13:58.920
It says a very, very long tale of features.
00:13:58.920 --> 00:14:13.400
And once we are done with NumPy, we'll try to improve the story of calling other numeric libraries on top of PyPy to be able to mostly seamlessly be able to use stuff like SciPy and Matplotlib.
00:14:13.400 --> 00:14:15.140
It will still take a while.
00:14:15.140 --> 00:14:17.940
I'm not even willing to give an estimate.
00:14:17.940 --> 00:14:19.520
Sure.
00:14:19.520 --> 00:14:20.380
But it's great.
00:14:20.380 --> 00:14:21.980
And it does look like there's a lot of support there.
00:14:21.980 --> 00:14:26.940
We'll talk about that stuff in a little bit because I definitely want to call attention to that and let people know how they can help out.
00:14:27.940 --> 00:14:39.460
Before we get into those kind of details, though, can we talk just briefly about why would I use PyPy or when and why would I use PyPy over, say, CPython or Jython?
00:14:39.460 --> 00:14:41.500
Like, what do you guys excel at?
00:14:41.500 --> 00:14:46.080
When should a person out there is thinking, like, they've just realized, oh, my gosh, there's more than one interpreter?
00:14:46.080 --> 00:14:48.080
How do I choose?
00:14:48.180 --> 00:14:49.940
Like, can you help give some guidance around that?
00:14:49.940 --> 00:14:55.940
So typically, if you just discovered, oh, there's more than one interpreter, you just want to use CPython.
00:14:55.940 --> 00:14:57.460
That's like the simplest answer.
00:14:57.460 --> 00:15:04.280
You want to use CPython, but if you're writing an open source library, you want to support PyPy at least, which is what most people are doing.
00:15:04.280 --> 00:15:08.100
They're using CPython and the libraries support PyPy for the most part.
00:15:08.100 --> 00:15:14.480
Our typical user, and this is a very terrible description, but this is our typical user.
00:15:14.480 --> 00:15:27.140
This episode is brought to you by Hired.
00:15:27.140 --> 00:15:33.600
Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.
00:15:33.600 --> 00:15:42.760
Each offer you receive has salary and equity presented right up front, and you can view the offers to accept or reject them before you even talk to the company.
00:15:43.260 --> 00:15:49.140
Typically, candidates receive five or more offers in just the first week, and there are no obligations, ever.
00:15:49.140 --> 00:15:51.220
Sounds pretty awesome, doesn't it?
00:15:51.220 --> 00:15:53.260
Well, did I mention there's a signing bonus?
00:15:53.260 --> 00:16:01.720
Everyone who accepts a job from Hired gets a $2,000 signing bonus, and as Talk Python listeners, it gets way sweeter.
00:16:01.720 --> 00:16:09.280
Use the link Hired.com slash Talk Python To Me, and Hired will double the signing bonus to $4,000.
00:16:10.180 --> 00:16:11.000
Opportunity's knocking.
00:16:11.000 --> 00:16:14.600
Visit Hired.com slash Talk Python To Me and answer the call.
00:16:14.600 --> 00:16:30.800
You have a large Python application that's spanning servers, serving millions of users,
00:16:30.800 --> 00:16:33.460
and you're running into corners.
00:16:33.460 --> 00:16:37.860
Like, you can't serve requests quickly enough.
00:16:37.860 --> 00:16:40.100
You can't serve enough users from machine.
00:16:40.100 --> 00:16:41.840
You're running into problems.
00:16:41.840 --> 00:16:48.920
Now, your application is too big to, say, rewrite it in C or Go, or it's just, like, too scary for whatever reason.
00:16:50.120 --> 00:16:54.900
So, you look, like, what it would take to run stuff in PyPy.
00:16:54.900 --> 00:17:06.160
It usually takes, like, a bit of, your code should run, but it usually takes a bit of effort to, like, see what sort of libraries do you use.
00:17:06.160 --> 00:17:07.600
Do you use NSE extensions?
00:17:07.600 --> 00:17:11.000
If their C extensions are, like, crucial, can you replace them with something?
00:17:11.660 --> 00:17:13.480
So, yeah, this is our typical user.
00:17:13.480 --> 00:17:18.460
And I have people, I run a consulting company that does that.
00:17:18.460 --> 00:17:22.640
There are people coming and asking, like, okay, I have this set up.
00:17:22.640 --> 00:17:25.940
It's impossible to do anything with it now.
00:17:25.940 --> 00:17:30.480
Can I just, like, swap the interpreters, make it run faster, and make the problems go away?
00:17:30.480 --> 00:17:31.980
This is our typical user.
00:17:33.660 --> 00:17:38.680
I hear why you described it that way is maybe not the best way, but, you know, you're right.
00:17:38.680 --> 00:17:44.920
If you have 100,000, half a million lines of Python, and really you just need to make it a little faster.
00:17:44.920 --> 00:17:49.140
If switching to a different interpreter like PyPy will solve that, that's great.
00:17:49.140 --> 00:17:53.600
So, speaking of faster, can you talk about the performance comparisons?
00:17:53.600 --> 00:17:57.400
I have a little example I'll tell you, but I'll let you go first.
00:17:57.400 --> 00:18:03.520
So, as usual, performance comparisons are usually very hard to do and flawed.
00:18:04.200 --> 00:18:05.520
Everybody, yes, absolutely.
00:18:05.520 --> 00:18:10.580
Everybody's thing they care about is not exactly what you're measuring, and so it might be totally misleading.
00:18:10.580 --> 00:18:12.000
But give it a shot.
00:18:12.000 --> 00:18:17.900
One good estimate is if you don't have benchmarks, you don't care about performance.
00:18:17.900 --> 00:18:24.380
Like, if you never wrote benchmarks for your applications, then chances are you don't actually care all that much.
00:18:24.380 --> 00:18:27.540
And you shouldn't really...
00:18:27.540 --> 00:18:28.420
That's the first step.
00:18:28.420 --> 00:18:31.280
Like, make sure you know how fast your applications run.
00:18:32.620 --> 00:18:34.980
Once you know that, you can measure it on different interpreters.
00:18:34.980 --> 00:18:41.960
But as far as expectations go, PyPy tends to run heavy computations a lot faster.
00:18:41.960 --> 00:18:47.940
Like, a lot is anything between 10 and 100 times faster, depending on the workload.
00:18:49.300 --> 00:18:50.820
For stuff that's more...
00:18:50.820 --> 00:18:54.000
And again, what is a typical Python program?
00:18:54.000 --> 00:18:56.640
Typical Python program is probably Hello World.
00:18:56.640 --> 00:18:58.500
How fast Python runs Hello World.
00:18:58.500 --> 00:19:01.440
Roughly at the same speed as CPython, you won't notice.
00:19:01.440 --> 00:19:09.060
But for a typical web application, the speed up, if you're not heavily relying on C extensions, would be around 2x.
00:19:09.940 --> 00:19:13.260
So, 2x faster for a lot of people makes a lot of difference.
00:19:13.260 --> 00:19:14.060
Absolutely.
00:19:14.060 --> 00:19:16.500
It also depends on where are you waiting.
00:19:16.500 --> 00:19:18.560
Like you said, you should profile it and figure this out.
00:19:18.560 --> 00:19:26.880
If your Python web app is slow because 80% of the time you're waiting on the database, well, it doesn't really matter how fast your Python code is.
00:19:26.880 --> 00:19:27.860
Your database is a problem.
00:19:27.860 --> 00:19:28.840
Or something like this, right?
00:19:29.640 --> 00:19:30.040
Exactly.
00:19:30.040 --> 00:19:30.120
Exactly.
00:19:30.120 --> 00:19:36.640
And like, the thing is like, so let's narrow it down to, say, web applications.
00:19:36.640 --> 00:19:40.840
Like, okay, let me first talk about other stuff and then let's go to web applications.
00:19:40.840 --> 00:19:46.280
Like, where people found Piper incredibly useful is things like high-frequency trading.
00:19:46.280 --> 00:19:52.460
Like, not the very crazy high-frequency where you have to make decisions like multiple times per millisecond.
00:19:52.460 --> 00:19:58.440
But like the sort of frequency where you want to make decisions within a few milliseconds.
00:19:58.440 --> 00:20:02.520
And then those decisions are like tens of milliseconds.
00:20:02.520 --> 00:20:10.060
Those decisions can, then you want to be able to modify your algorithms fast, which is a lot easier on Python than, say, on C++.
00:20:10.060 --> 00:20:16.040
And you're running into less problems with how to shoot yourself in the foot and segfault all your trading.
00:20:16.040 --> 00:20:24.260
So, that's when people tend to use Piper because like, in this sort of scenario, it would be like 10 times faster.
00:20:24.260 --> 00:20:28.500
So, super low latency stuff where 10 milliseconds makes a huge difference to you.
00:20:28.500 --> 00:20:29.220
Something like that.
00:20:29.220 --> 00:20:29.480
Yeah.
00:20:29.480 --> 00:20:30.080
Okay.
00:20:30.080 --> 00:20:39.820
Another example is there's, for example, a project called MyHDL, which is the hardware emulation layer.
00:20:40.600 --> 00:20:48.180
And these tend to emit sort of low-level Python code that just do computations to emulate hardware.
00:20:48.180 --> 00:20:51.720
And then again, on Piper, it's like over 10 times faster.
00:20:51.720 --> 00:20:53.500
So, those are the very good examples.
00:20:53.500 --> 00:20:54.940
The very bad examples, as you said.
00:20:54.940 --> 00:21:00.220
If your program, if your staff is waiting on the database, then you're out of luck.
00:21:00.360 --> 00:21:02.720
Like, no matter how fast your interpreter responds.
00:21:02.720 --> 00:21:05.100
But yeah.
00:21:05.100 --> 00:21:11.860
On the typical web server load, even if there is such a thing, it would be around two times speed up.
00:21:11.860 --> 00:21:13.440
Sometimes more, sometimes less.
00:21:13.440 --> 00:21:15.260
Depending on the setup, really.
00:21:15.260 --> 00:21:18.620
But as I said, you should really measure yourself.
00:21:18.620 --> 00:21:25.920
The things where Python is quite better, if you spend most of the time in C extensions,
00:21:26.760 --> 00:21:29.800
then it's either not helping or actually prevent you from doing so.
00:21:29.800 --> 00:21:36.260
And the second time where it's not that great is when the program is short running.
00:21:36.260 --> 00:21:41.340
So, because it's just-in-time compilation, it means that each time you run your program,
00:21:41.340 --> 00:21:49.020
the interpreter has to look what's going on, pick things to compile to Assembler, compile them to Assembler,
00:21:49.020 --> 00:21:50.220
and that all takes time.
00:21:50.220 --> 00:21:50.840
Right.
00:21:50.840 --> 00:21:53.520
There's a little more initial startup when that happens.
00:21:54.340 --> 00:21:57.040
Yeah, the warm-up time is usually quite bad.
00:21:57.040 --> 00:22:01.440
Well, I like to think that warm-up time of PyPy is quite bad.
00:22:01.440 --> 00:22:04.000
And then I look at Java, when it's absolutely outrageous.
00:22:04.000 --> 00:22:07.700
It's a relative statement.
00:22:07.700 --> 00:22:08.540
It's a relative term.
00:22:08.540 --> 00:22:11.120
Like, compared to CPython, PyPy time is really terrible.
00:22:11.120 --> 00:22:14.720
And compared to Luach, it's, again, the warm-up time is terrible.
00:22:14.720 --> 00:22:16.560
But compared to Java, it's not that bad.
00:22:17.080 --> 00:22:20.080
So, yeah, it really depends on your setup.
00:22:20.080 --> 00:22:23.560
And it's typically important for long-running applications.
00:22:23.560 --> 00:22:26.060
Then again, this is a typical PyPy user.
00:22:26.060 --> 00:22:32.320
When stuff like server-based applications where your programs run for a long time.
00:22:32.320 --> 00:22:33.600
Right.
00:22:33.600 --> 00:22:38.800
You start it up and it's going to serve a million requests an hour until it gets recycled or something, yeah?
00:22:38.800 --> 00:22:40.800
Something like that.
00:22:40.800 --> 00:22:44.680
I mean, these days, even JavaScript is long-running up.
00:22:44.680 --> 00:22:46.900
Like, how long do you keep your Gmail open?
00:22:46.900 --> 00:22:49.460
For usually, for longer than a few seconds.
00:22:49.460 --> 00:22:51.560
Yeah, that's for sure.
00:22:51.560 --> 00:22:55.600
So, let's talk a little bit about the internals.
00:22:55.600 --> 00:22:59.700
Could you describe just a little bit of...
00:23:00.320 --> 00:23:05.700
So, if I take a Python script and it's got some classes and some functions and they're calling each other and so on.
00:23:05.700 --> 00:23:09.900
What does it look like in terms of what's happening when that code runs?
00:23:09.900 --> 00:23:11.280
Okay.
00:23:11.280 --> 00:23:16.280
So, I'll maybe start from, like, how PyPy is built and then get back to your question directly.