forked from mikeckennedy/talk-python-transcripts
-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy path049_ Microsofts_JIT_based_Python_Project_Pyjion.vtt
2834 lines (1889 loc) · 93.9 KB
/
049_ Microsofts_JIT_based_Python_Project_Pyjion.vtt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
WEBVTT
00:00:00.001 --> 00:00:04.380
This episode, you'll learn about a project that has the potential to unlock massive innovation
00:00:04.380 --> 00:00:07.160
around how CPython understands and executes code.
00:00:07.160 --> 00:00:10.620
And it's coming from what many of you may consider an unlikely source,
00:00:10.620 --> 00:00:15.120
Microsoft and the recently open-sourced cross-platform .NET Core Runtime.
00:00:15.120 --> 00:00:19.760
You'll meet Brett Cannon, who works on Microsoft's Azure Data Group.
00:00:19.760 --> 00:00:25.940
Along with Dino Villan, he is working on a new initiative called PYJION, P-Y-J-I-O-N,
00:00:25.940 --> 00:00:29.680
a JIT framework that can become part of CPython itself,
00:00:29.680 --> 00:00:33.600
paving the way for many new just-in-time compilation initiatives in the future.
00:00:33.600 --> 00:00:39.580
This is episode number 49 of Talk Python to Me, recorded February 4th, 2016.
00:00:51.880 --> 00:01:09.720
Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.
00:01:09.720 --> 00:01:13.840
This is your host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy.
00:01:13.840 --> 00:01:17.740
Keep up with the show and listen to past episodes at talkpython.fm,
00:01:17.740 --> 00:01:20.320
and follow the show on Twitter via at Talk Python.
00:01:20.320 --> 00:01:23.920
This episode is brought to you by Hired and SnapCI.
00:01:23.920 --> 00:01:30.640
Thank them for supporting the show on Twitter via at Hired underscore HQ and at Snap underscore CI.
00:01:31.400 --> 00:01:33.880
Hey, everyone. I think you're going to love this episode.
00:01:33.880 --> 00:01:39.660
Brett is doing some amazing work, and we talk about that in depth, but he's also a Python core developer,
00:01:39.660 --> 00:01:45.980
and we spend a decent amount of time on Python 3 and moving from Python 2 to Python 3 and that whole story there.
00:01:45.980 --> 00:01:49.740
I do have just one piece of news for you before we get to the interview.
00:01:49.740 --> 00:01:55.540
It's just T minus 10 days until my Kickstarter for Python jumpstart by building 10 apps closes.
00:01:55.540 --> 00:02:00.020
The initial feedback from the early access students has been universally positive.
00:02:00.020 --> 00:02:05.520
If you have backed the Kickstarter with early access, be sure to create an account at training.talkpython.fm
00:02:05.520 --> 00:02:11.280
and send me a message via Kickstarter so I can get you the first six chapters, about three hours, of the course.
00:02:11.280 --> 00:02:16.860
If you're not sure what I'm talking about here, check out my online course at talkpython.fm/course.
00:02:16.860 --> 00:02:22.060
Now, let's hear about JIT innovation in CPython and more with Brett Cannon.
00:02:22.060 --> 00:02:23.540
Brett, welcome to the show.
00:02:23.540 --> 00:02:24.240
Thanks for having me, Michael.
00:02:24.240 --> 00:02:29.660
I'm super excited to talk to you about this new project that you guys have going on with Python and Microsoft.
00:02:29.660 --> 00:02:31.640
And yeah, we're going to dig into it. It'll be fun.
00:02:31.640 --> 00:02:32.700
Yeah, I'm looking forward to it.
00:02:32.700 --> 00:02:33.320
Absolutely.
00:02:33.320 --> 00:02:36.840
So before we get into that topic, though, what's your story?
00:02:36.840 --> 00:02:39.540
How do you get going in Python and programming and all that?
00:02:39.540 --> 00:02:40.760
They're slightly long stories.
00:02:40.760 --> 00:02:48.340
So getting into programming, probably my earliest experience with anything you could potentially call programming was Turtle back in third grade.
00:02:48.340 --> 00:02:52.100
I was lucky enough to be in a school that had a computer lab full of Apple IIEs.
00:02:52.340 --> 00:02:59.900
And they'd bring us in and say, oh, look, you can do this little forward command and make this little turtle graphic draw a line and all this stuff.
00:02:59.900 --> 00:03:02.460
Was that on the monitor that was just like monochrome green?
00:03:02.660 --> 00:03:05.680
Yep. And that's why I think I used one of those, too.
00:03:05.680 --> 00:03:11.620
Yeah. I sometimes run my terminal with that old green and black style because it's just what I started with back in the day.
00:03:11.620 --> 00:03:12.320
Oh, that's awesome.
00:03:12.320 --> 00:03:14.660
So I did that, but I didn't realize what the heck programming was.
00:03:15.120 --> 00:03:21.640
But I always found computers kind of this fascinating black box that somehow you stick in these five and a fourth inch floppies, which dates me.
00:03:21.640 --> 00:03:24.740
And somehow we're in the world in Carmen San Diego plays.
00:03:24.740 --> 00:03:25.860
I was like, wow, this is amazing.
00:03:26.400 --> 00:03:33.680
And then in junior high, I ended up taking a summer class on computers and it involved a little bit of Apple basic.
00:03:33.680 --> 00:03:35.860
And I really took to it.
00:03:35.860 --> 00:03:38.600
I actually lucked out and got so far ahead of the class.
00:03:38.600 --> 00:03:41.680
The teacher just said, yeah, you can stop coming to class if you want for the rest of the summer.
00:03:41.680 --> 00:03:43.680
So that was like halfway through.
00:03:44.120 --> 00:03:49.100
So I got bit kind of early, but I didn't really have any guidance or anything back then.
00:03:49.100 --> 00:03:54.180
I mean, this is pre-access to the Internet, so I didn't really have any way to really know how to carry on.
00:03:54.180 --> 00:04:01.460
And then when I went to junior college, my mom made me promise her that I would take a class in philosophy and a class in computer science.
00:04:01.460 --> 00:04:03.400
And I did both and I loved them both.
00:04:03.400 --> 00:04:08.740
But in terms of the computer science, I read through my C book within two weeks.
00:04:08.740 --> 00:04:13.560
And then one night, spent six hours in front of my computer writing tic-tac-toe from scratch.
00:04:14.100 --> 00:04:15.660
Using really basic terminal output.
00:04:15.660 --> 00:04:17.640
And I was basically hooked for life.
00:04:17.640 --> 00:04:19.040
In terms of Python.
00:04:19.040 --> 00:04:20.320
That's really cool.
00:04:20.320 --> 00:04:28.640
I think we all have that moment where you sit down at a computer and you haven't, maybe you've really enjoyed working with them or whatever.
00:04:28.640 --> 00:04:33.740
But then you kind of get into programming and you realize, wow, eight hours have passed.
00:04:33.740 --> 00:04:35.580
And it feels like I just sat down.
00:04:35.580 --> 00:04:37.520
And then you're in the world.
00:04:37.520 --> 00:04:37.920
That's it.
00:04:37.920 --> 00:04:39.540
Brought me my dinner at my desk.
00:04:39.540 --> 00:04:40.860
And you said, okay, I get it.
00:04:40.860 --> 00:04:42.160
You're just into this.
00:04:42.160 --> 00:04:43.500
Just go with it.
00:04:44.080 --> 00:04:44.700
Here's your food.
00:04:44.700 --> 00:04:45.840
Make sure you eat at some point tonight.
00:04:45.840 --> 00:04:46.340
Awesome.
00:04:46.340 --> 00:04:47.120
Yeah.
00:04:47.120 --> 00:04:56.720
And in terms of Python, I actually ended up going to Berkeley and getting a degree in philosophy because there were some issues trying to double major like I originally planned to do.
00:04:56.720 --> 00:04:59.340
But I did try to still take all the CS courses there.
00:04:59.700 --> 00:05:04.960
And there was a test to basically get into the intro of CS course at Berkeley at the time.
00:05:05.360 --> 00:05:08.300
And I thought they might have something about object-oriented programming.
00:05:08.300 --> 00:05:11.800
And having learned C, I knew procedural, but I didn't know object-oriented programming.
00:05:11.800 --> 00:05:20.320
So in fall of 2000, before I took the class in spring, I decided to try to find an object-oriented programming language to learn OO from.
00:05:20.940 --> 00:05:22.960
And I was reading and all this stuff.
00:05:22.960 --> 00:05:25.160
And Perl and Python caught my eye.
00:05:25.160 --> 00:05:28.520
But when I kept reading, Perl should be like the fifth or sixth language you learned.
00:05:28.520 --> 00:05:30.740
While people kept saying, oh, Python's great for teaching.
00:05:30.740 --> 00:05:31.960
I mean, all right, I'll learn Python.
00:05:31.960 --> 00:05:33.140
And I did.
00:05:33.140 --> 00:05:33.760
And I loved it.
00:05:33.760 --> 00:05:37.780
And then I just continued to use it for anything I could and all my personal projects.
00:05:37.780 --> 00:05:39.940
And just kept going and going with it.
00:05:39.940 --> 00:05:40.860
And I haven't looked back since.
00:05:40.980 --> 00:05:41.720
Yeah, that's really cool.
00:05:41.720 --> 00:05:45.780
What language was your CS 101 course actually in?
00:05:45.780 --> 00:05:46.640
Scheme, actually.
00:05:46.640 --> 00:05:47.220
Interesting.
00:05:47.220 --> 00:05:50.880
My CS 101 class was Scheme as well.
00:05:50.880 --> 00:05:53.580
And I thought that was a very interesting choice for an introduction.
00:05:53.580 --> 00:05:55.700
Yeah, it was really interesting.
00:05:55.700 --> 00:05:58.160
I mean, it does kind of do away with the syntax.
00:05:58.160 --> 00:06:06.320
But obviously, now being a Python user, I really understand what it means to kind of really minimize the syntax in a nice way instead of a slightly painful way with all those parentheses.
00:06:06.320 --> 00:06:08.860
And it was interesting.
00:06:08.860 --> 00:06:14.020
I mean, it is a nice way to try to get in procedural programming and object-oriented and functional.
00:06:14.020 --> 00:06:19.820
So it was really nice to do multi-paradigm, teach you the basics kind of introduction.
00:06:19.820 --> 00:06:28.380
They did actually, interestingly enough, for the last project to have us write a really basic logo interpreter, which, funny enough, was such a bad experience for me,
00:06:28.380 --> 00:06:32.040
partially because of the way it worked out in terms of having to work with another team.
00:06:32.040 --> 00:06:34.960
And I had some issues with my teammates.
00:06:35.180 --> 00:06:40.040
I actually kind of got turned off on language design, of all things, for a little while.
00:06:40.040 --> 00:06:44.760
And then I just, over time, kept realizing I loved programming languages, learning how they worked.
00:06:44.760 --> 00:06:55.280
So I just re-evaluated my view and just realized, okay, it was just a bad taste from a bad experience and realized that I actually do have this weird little fascination with programming languages.
00:06:55.280 --> 00:06:57.400
And luckily got over that little issue of mine.
00:06:57.520 --> 00:06:58.120
Yeah, no kidding.
00:06:58.120 --> 00:07:01.500
And now you're a Python core developer, among other things, right?
00:07:01.500 --> 00:07:01.780
Yeah.
00:07:01.780 --> 00:07:05.940
So back to the language design, at least on the internals.
00:07:05.940 --> 00:07:06.560
Yeah, yeah.
00:07:06.560 --> 00:07:06.840
Awesome.
00:07:07.200 --> 00:07:14.560
So we're going to talk about Pigeon, this cool new JIT extension.
00:07:14.560 --> 00:07:19.780
You're going to have to tell me a little more about how you'd most correctly characterize it for CPython.
00:07:19.780 --> 00:07:24.980
But before we do, I thought maybe you could give us like a high-level view of two things.
00:07:24.980 --> 00:07:32.260
How CPython works, what's sort of going on when we run our code as is, right, with the interpreter.
00:07:32.260 --> 00:07:36.500
And then maybe a survey of the different implementations or runtimes.
00:07:36.500 --> 00:07:42.540
Because a lot of people think there's just one Python from an implementation or runtime perspective.
00:07:42.540 --> 00:07:44.620
And there's actually quite a variety already, right?
00:07:44.620 --> 00:07:50.380
Yeah, actually, we're kind of lucky in the Python community of having a lot of really top-quality implementations.
00:07:50.560 --> 00:07:55.800
But to target your first question of how CPython works, which is, for those who don't know,
00:07:55.800 --> 00:07:59.140
CPython is the version of Python you get from python.org.
00:07:59.140 --> 00:08:04.420
And the reason it's called CPython is because it's implemented in C and has a C API,
00:08:04.420 --> 00:08:07.140
which makes it easy to embed in stuff like Blender.
00:08:07.140 --> 00:08:12.400
Anyway, basically, the way Python works is more or less like a traditional interpreted programming language
00:08:12.400 --> 00:08:14.040
where you write your source code.
00:08:14.040 --> 00:08:20.200
Python acts as a VM, reads the source code, parses it into individual tokens like
00:08:20.200 --> 00:08:24.800
if and def and, oh, that's a plus sign and whatever.
00:08:24.800 --> 00:08:28.160
And then that gets turned into what's called a concrete syntax tree,
00:08:28.160 --> 00:08:32.280
which is kind of just like the way the grammar is written kind of nests things.
00:08:32.280 --> 00:08:35.760
And this is how you get your priorities in terms of precedence,
00:08:35.760 --> 00:08:40.320
like multiplication happens before plus, which happens before whatever.
00:08:40.320 --> 00:08:44.660
And that all works out in the concrete syntax tree in terms of how it nests itself.
00:08:45.060 --> 00:08:51.220
And then that gets passed into a compiler within Python that turns that into what's called an abstract syntax tree,
00:08:51.220 --> 00:08:52.560
which is much more high level.
00:08:52.560 --> 00:08:55.720
Like this is addition instead of plus and two things.
00:08:55.720 --> 00:08:58.180
And this is loading a value.
00:08:58.180 --> 00:08:59.760
And this is an actual number.
00:08:59.760 --> 00:09:01.820
And this is a function call.
00:09:02.020 --> 00:09:09.080
And then that gets passed farther down into the bytecode compiler, which will then take that AST and spit out Python bytecode.
00:09:09.080 --> 00:09:13.100
And that's actually what's stored basically in your PYC files.
00:09:13.100 --> 00:09:15.580
Actually, technically, they're marshaled code objects.
00:09:15.580 --> 00:09:24.100
And then when Python wants to execute that, it just loads up those bytecodes and just has a really big for loop that basically reads through those individual bytecodes.
00:09:24.100 --> 00:09:26.160
It goes, OK, what do you want me to do?
00:09:26.380 --> 00:09:27.940
All right, you want me to load a const.
00:09:27.940 --> 00:09:29.360
Const is zero.
00:09:29.360 --> 00:09:32.740
And that happens to correlate to none in every code object.
00:09:32.740 --> 00:09:39.060
So I'm going to put none onto what's called the execution stack because Python is stack-based instead of register-based.
00:09:39.060 --> 00:09:40.780
So CPUs are register-based.
00:09:40.780 --> 00:09:43.240
Stack-based VMs such as Python.
00:09:43.240 --> 00:09:44.640
Java is another one.
00:09:44.640 --> 00:09:47.880
It's fairly common because it's easier to implement.
00:09:48.660 --> 00:09:53.820
Anyway, you can do stuff like load const none or load a number, load another number on the stack.
00:09:53.820 --> 00:09:54.960
So the stack now has two numbers.
00:09:54.960 --> 00:10:00.480
And then the loop might, the C eval loop for evaluation loop.
00:10:00.480 --> 00:10:00.840
Yeah.
00:10:00.840 --> 00:10:06.400
So it's worth pointing out to the listeners, I think, who maybe haven't gone and looked at the source code there.
00:10:06.400 --> 00:10:11.820
When you say it's a big loop, it's like 3,000 lines of C code or something, right?
00:10:11.820 --> 00:10:13.180
It's a big for loop.
00:10:13.180 --> 00:10:15.900
Yeah, it literally is a massive for loop.
00:10:15.900 --> 00:10:24.940
If you actually go to Python source code and you look in the Python directory, there's a file in there called ceval.c.
00:10:24.940 --> 00:10:35.980
You can open that up and you will literally find nested in that file somewhere just a for loop with a huge switch statement that does nothing more than just execute these little byte codes.
00:10:35.980 --> 00:10:47.300
So like if it hits add, what it'll do is just pop two values off of what's basically a chunk of memory where we know what's pointers are on the stack and just go, I'm going to take that Python object.
00:10:47.300 --> 00:10:53.460
I'm going to take that Python object and execute the dunder add in the right way or the dunder r add and then make that all happen.
00:10:53.460 --> 00:11:01.340
Get back a Python object and stick that back on the stack and then just go back to the top of the for loop and just keep going and going and going until you're done and your program exists.
00:11:01.340 --> 00:11:15.820
Yeah, and you can actually see that byte code by taking loading up some Python module or function or class or whatever and importing the disassembly module and you can actually have it spit out the byte codes for like say a function, right?
00:11:15.820 --> 00:11:16.080
Yep.
00:11:16.080 --> 00:11:18.700
And I do this all the time on Pigeon, actually.
00:11:18.700 --> 00:11:21.480
Basically, you can import the dis module, D-I-S.
00:11:22.080 --> 00:11:24.120
And in there, there's a dis function.
00:11:24.120 --> 00:11:35.460
So if you go dis.dis and then pass in any callable, basically, so function, method, whatever, and it'll just print out to standard out in your REPL all the byte code.
00:11:35.460 --> 00:11:38.460
And it'll give you information like what line does this correlate to?
00:11:38.460 --> 00:11:40.360
What is the byte code?
00:11:40.360 --> 00:11:42.220
What's the argument to that byte code?
00:11:42.220 --> 00:11:45.800
The actual byte offset and a whole bunch of other interesting things.
00:11:45.800 --> 00:11:50.820
And the dis module documentation actually lists most of the byte code.
00:11:50.820 --> 00:11:53.740
I actually found a couple of opcodes that weren't actually documented.
00:11:53.740 --> 00:11:54.680
Now there's a bug for that.
00:11:54.680 --> 00:11:57.560
But the majority of the byte code is actually documented there.
00:11:57.560 --> 00:12:05.560
So if you're really interested, you can have a look to see actually how we kind of break down the operations for Python for performance reasons and such.
00:12:05.960 --> 00:12:07.220
Yeah, that's really interesting.
00:12:07.220 --> 00:12:19.980
And for the listeners who are wanting to dig deeper into this, on show 22, I talked with Philip Guau about his sort of CPython internals graduate course he did in the University of New York.
00:12:19.980 --> 00:12:20.940
Have you seen his work?
00:12:20.940 --> 00:12:21.980
No, I haven't yet.
00:12:21.980 --> 00:12:30.280
He basically recorded 10 hours of a graduate computer science course studying the internals of CPython and spent a lot of time in cval.c.
00:12:30.280 --> 00:12:31.400
And it's on YouTube.
00:12:31.400 --> 00:12:32.140
You can go check it out.
00:12:32.140 --> 00:12:32.960
So it's really cool.
00:12:32.960 --> 00:12:34.940
So that's interesting.
00:12:35.140 --> 00:12:38.360
Oh, I should probably actually answer your second question, too, about all the other interpreters.
00:12:38.360 --> 00:12:39.880
Yeah, so let's talk about the interpreters.
00:12:39.880 --> 00:12:46.140
As I said earlier, CPython is kind of, it's the one you get from python.org and kind of the one most people are aware of.
00:12:46.140 --> 00:12:49.340
But there's actually a bunch of other ones.
00:12:49.340 --> 00:12:58.900
So one of the more commonly known alternative interpreters or VMs or implementations of Python is Jython, which is Python implemented in Java.
00:12:58.900 --> 00:13:03.720
So a lot of people love that whenever they have to write a Java app and want some easy scripting to plug in.
00:13:04.320 --> 00:13:06.800
Or have some requirement that they have to run on the JVM.
00:13:06.800 --> 00:13:10.380
Apparently, it's really popular in the defense industry for some reason.
00:13:10.380 --> 00:13:10.740
Interesting.
00:13:10.740 --> 00:13:13.860
Once you get a VM approved, you just don't mess with it, I'd say.
00:13:13.860 --> 00:13:14.900
Yeah.
00:13:14.900 --> 00:13:25.220
Well, and one really cool perk of this is PyCon, every so often there's a really cool talk about flying fighter jets with Python using Jython and stuff like that.
00:13:25.360 --> 00:13:27.320
So it does at least lead to some really cool talks.
00:13:27.320 --> 00:13:27.740
Nice.
00:13:27.740 --> 00:13:29.500
And here's the afterburner function.
00:13:29.500 --> 00:13:30.480
You just call this.
00:13:30.480 --> 00:13:32.040
Exactly.
00:13:32.340 --> 00:13:35.620
There's Iron Python, which is Python implemented in C#.
00:13:35.620 --> 00:13:37.720
So that's usable from .NET.
00:13:37.720 --> 00:13:47.720
So once again, it's often used for embedding in .NET applications that need scripting or anyone who needs to run on top of the CLR.
00:13:48.000 --> 00:13:49.540
Those are the two big ones.
00:13:49.540 --> 00:13:57.660
Obviously, in terms of direct alternatives, there's obviously PyPy, which I think a lot of people know about, which is two things.
00:13:57.660 --> 00:14:09.540
There's PyPy, the implementation of Python written in Python, although technically it's a subset of Python called RPython, which is specifically restricted such that they can infer a lot of information about it.
00:14:09.580 --> 00:14:13.180
So that can be compiled down straight to basically assembly.
00:14:13.180 --> 00:14:25.140
And then there's PyPy, the tool chain, which they developed for PyPy, the Python implementation, which is basically this tool chain to create custom jets for programming languages.
00:14:25.140 --> 00:14:33.060
So you can take the PyPy tool chain and not just implement Python in Python, but they've done it for like PHP, for instance.
00:14:33.320 --> 00:14:40.520
And so you can actually write alternative implementations of languages in RPython and have it spit out a custom just designed for your language.
00:14:40.520 --> 00:14:46.760
Those are the key ones that have actually finished in terms of compatibility with some specific version of Python.
00:14:46.760 --> 00:14:48.880
All of them currently target 2.7.
00:14:48.880 --> 00:14:55.280
PyPy has support for Python 3.2, but obviously that's kind of an old support in terms of Python 3.
00:14:55.280 --> 00:15:00.820
And then there's the new up-and-comer, which is Piston, which is being sponsored by Dropbox.
00:15:00.820 --> 00:15:02.600
And they're also targeting 2.7.
00:15:02.600 --> 00:15:09.180
And they're trying to version a Python that is as compatible with CPython as possible, including the C extension API.
00:15:09.180 --> 00:15:14.380
But what they're doing is they've added a JIT or using a JIT from LLVM.
00:15:14.380 --> 00:15:27.840
So they're trying to make 2.7 fast using LLVM JIT and pulling as much of the C code and API as they can from CPython to try to be compatible with extension modules, which is a common problem that PyPy, IronPython, and Drython have.
00:15:27.840 --> 00:15:32.560
Right. That one actually seems to be really interesting and have a lot of potential.
00:15:32.560 --> 00:15:39.020
Because if you think of companies that are sort of Python powerhouses, Dropbox is definitely among them.
00:15:39.020 --> 00:15:43.320
Yeah, it definitely does not hurt when Guido went to go work there as well.
00:15:43.320 --> 00:15:46.600
And they have Justin McKellar there and several other people.
00:15:46.600 --> 00:15:48.320
Benjamin Peterson works for them.
00:15:48.620 --> 00:15:52.320
So they already have a couple of core devs and high up people in the Python community working there.
00:15:52.320 --> 00:15:56.560
And their whole server stack in the back, I believe, is at least mostly Python.
00:15:56.560 --> 00:15:58.700
Their desktop clients are Python.
00:15:58.700 --> 00:16:00.920
They're definitely Python heavy there.
00:16:00.920 --> 00:16:01.720
Yeah, absolutely.
00:16:02.340 --> 00:16:13.220
So how does Pigeon relate to the thing that came to mind for me when I saw it announced was, you know, a friend of mine, Craig Bernstein, sent me a message on Twitter and said, hey, you have to check this out.
00:16:13.220 --> 00:16:15.000
And I'm like, oh, that is awesome.
00:16:15.000 --> 00:16:17.320
And it was just, you know, a Twitter message.
00:16:17.320 --> 00:16:21.840
You know, check out this JIT version of Python coming from Microsoft.
00:16:22.280 --> 00:16:26.000
Well, I don't know anything about it, but maybe it's like PyPy.
00:16:26.000 --> 00:16:28.520
So what are you guys actually building over there?
00:16:28.520 --> 00:16:29.020
What is this?
00:16:29.020 --> 00:16:32.900
Pigeon was actually started by Dino Velen, one of my coworkers.
00:16:32.900 --> 00:16:43.740
And I believe that I don't know if he's necessarily the sole creator, but definitely one of the original creators of Iron Python back at PyCon US 2015, which was in Montreal.
00:16:43.740 --> 00:16:49.200
During the language summit, Larry Hastings, the release manager for Python 3.4 and 3.5,
00:16:49.440 --> 00:16:55.560
got up in front of the core developers and said, what can we do to get more people to switch to Python 3 faster?
00:16:55.560 --> 00:17:01.840
Because obviously we all think Python 3 is awesome and legacy Python 2 is fine, but everyone should get off that at some point.
00:17:01.840 --> 00:17:02.420
Yeah, I hear you.
00:17:02.420 --> 00:17:02.800
I agree.
00:17:02.800 --> 00:17:03.880
So what do you do, right?
00:17:03.880 --> 00:17:07.520
Yeah, that could be a whole other question on that one, Michael.
00:17:07.520 --> 00:17:09.420
So he said, what can we do?
00:17:09.420 --> 00:17:09.960
What can we do?
00:17:09.960 --> 00:17:11.640
And he said, performance is always a good thing.
00:17:11.640 --> 00:17:15.240
People always seem to want more performance, no matter how well Python does.
00:17:15.240 --> 00:17:16.500
People are always hungry for more.
00:17:16.500 --> 00:17:18.720
And Dino went, yeah, that's a good idea.
00:17:18.980 --> 00:17:19.940
I know, I'll see.
00:17:19.940 --> 00:17:23.480
.NET just got open sourced back in April 2015.
00:17:23.480 --> 00:17:25.860
And he said, you know what?
00:17:25.860 --> 00:17:29.480
I will see if I can write a JIP for CPython using Core CLR.
00:17:29.480 --> 00:17:32.260
Because Dino also happened to used to be on the CLR team.
00:17:32.260 --> 00:17:35.360
So he knows the opcodes like the back of his hand.
00:17:35.360 --> 00:17:39.920
And so he started to hack on it at the conference and actually managed to get somewhere.
00:17:40.460 --> 00:17:45.940
And he premiered it at PyData Seattle back in July when we hosted it at Microsoft.
00:17:45.940 --> 00:17:50.460
And I got brought on to basically help him flesh out the goals.
00:17:50.460 --> 00:17:52.120
There's basically three goals.
00:17:52.300 --> 00:17:57.920
One is to develop a C API for CPython to basically make it pluggable for a JIT.
00:17:58.260 --> 00:18:11.500
Like one of the tough things that people have always done, like Unladen Swallow started with and Pistons also doing, is they're directly tying into a fork of CPython, more or less, a JIT, which really tightly couples it.
00:18:11.700 --> 00:18:18.600
But it also means that, for instance, if LLVM does not work for your workload for whatever reason, you're kind of just stuck and it's just not an option.
00:18:18.600 --> 00:18:24.160
Well, we would rather basically make it so that there's just an API to plug in a JIT.
00:18:24.160 --> 00:18:29.420
And then that way CPython doesn't have to ship with a JIT, but it's totally usable by a JIT.
00:18:29.420 --> 00:18:46.200
And then that way, if LLVM or CoreCLR, which is the .NET JIT or Chakra or V8 or whatever JIT you want, as long as someone basically writes the code to plug from CPython into that JIT, you can use whatever works best for you.
00:18:46.200 --> 00:18:47.860
That's really cool.
00:18:47.860 --> 00:19:05.800
I think it's a super noble goal to say, let's stop everybody starting from scratch, rebuilding the CPython sort of implementation and weaving in their version of a JIT and saying, let's just find a way so that you don't have to write that ever again.
00:19:05.800 --> 00:19:07.640
And you just plug in the pieces.
00:19:07.640 --> 00:19:08.460
Yeah, exactly.
00:19:08.460 --> 00:19:24.820
And actually, one of the other goals we have with this is not only developing the API, but goal number two is to write JIT for CPython using the CoreCLR and using that to drive the API design that we need that we want to push back up to CPython eventually.
00:19:25.140 --> 00:19:35.280
But the third goal is actually to design kind of a JIT framework for CPython such that we write the framework that drives the coding mission for the JIT.
00:19:35.280 --> 00:19:45.460
And then all the JIT people have to do is basically just write to the interface of this framework and don't have to worry about specific semantics necessarily.
00:19:45.700 --> 00:19:55.480
So, for instance, you would be able to, as a JIT author, go, OK, I need to know how to emit an integer onto a stack and I need to know how to do add or add int.
00:19:55.480 --> 00:20:01.280
But then the framework would actually handle going, OK, well, here's the Python bytecode that implements add.
00:20:01.280 --> 00:20:05.460
Let's actually do an add call or, hey, I know this thing is actually an integer.
00:20:05.460 --> 00:20:21.480
Let's do an add inc call and not just a generic Python add and be able to handle that level of difference so that there's a lot less busy work that's common to all the JITs like type inference and such and be able to extract that out so that it's even easier to add a JIT to CPython.
00:20:21.480 --> 00:20:23.100
So is that like two levels?
00:20:23.100 --> 00:20:34.660
Like on one hand, you have a straight C API at the CPython level and then optionally you could choose to use the C++ framework that makes it so you do less work and you plug in your sort of events or steps?
00:20:34.960 --> 00:20:35.440
Yeah, exactly.
00:20:35.440 --> 00:20:49.460
It's getting the bare minimum into CPython so that CPython at least has this option without everyone having to do a fork and as well as pushing down a level to a separate project where the common stuff is extrapolated out and everyone can just build off the same baseline.
00:20:49.460 --> 00:20:53.080
And then only thing that has to really differ is what's unique to the JITs.
00:20:53.080 --> 00:20:56.760
And then that way, everyone's work is as simple as possible to try to make this work.
00:20:56.760 --> 00:20:58.340
OK, that makes a lot of sense.
00:21:04.460 --> 00:21:11.260
This episode is brought to you by Hired.
00:21:11.260 --> 00:21:16.860
Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.
00:21:16.860 --> 00:21:24.340
Each offer you receive has salary and equity presented right up front and you can view the offers to accept or reject them before you even talk to the company.
00:21:25.200 --> 00:21:30.100
Typically, candidates receive five or more offers within the first week and there are no obligations ever.
00:21:30.100 --> 00:21:31.660
Sounds awesome, doesn't it?
00:21:31.660 --> 00:21:33.320
Well, did I mention the signing bonus?
00:21:33.320 --> 00:21:36.700
Everyone who accepts a job from Hired gets a $1,000 signing bonus.
00:21:36.700 --> 00:21:39.500
And as Talk Python listeners, it gets way sweeter.
00:21:39.500 --> 00:21:45.220
Use the link Hired.com slash Talk Python To Me and Hired will double the signing bonus to $2,000.
00:21:46.600 --> 00:21:47.360
Opportunity's knocking.
00:21:47.360 --> 00:21:50.780
Visit Hired.com slash Talk Python To Me and answer the call.
00:21:56.400 --> 00:22:03.280
Would you still be able to support things like method inlining and things like that with the C++ framework?
00:22:03.860 --> 00:22:08.100
We don't know yet, but there's technically no reason why not.
00:22:08.100 --> 00:22:15.100
What's actually really interesting is we started all this work and we actually weren't ready to premiere any of this yet.
00:22:15.100 --> 00:22:17.280
We've been doing this out in the open on GitHub.
00:22:17.280 --> 00:22:21.920
But as you mentioned, Michael, people started to tweet it and then it made it to Reddit and then it made it to Hacker News.
00:22:21.920 --> 00:22:23.740
And suddenly everyone's asking questions and stuff.
00:22:23.740 --> 00:22:34.560
But in the middle of all this, there's been a lot of work literally the past, I don't know, maybe two months of various core developers putting in a lot of time and effort trying to speed up CPython itself.
00:22:34.560 --> 00:22:47.120
And part of this is actually trying to cache method objects so that they can get cached in the code object and actually not have to, every time you try to execute like a call by code,
00:22:47.120 --> 00:22:52.820
not have to go to like the object, pull out the method object and then call that, but actually just cache the method object.
00:22:52.820 --> 00:22:53.800
I already have it.
00:22:53.800 --> 00:22:56.660
I don't need to re-access that attribute on the object.
00:22:56.660 --> 00:23:00.000
And so it's already starting to bubble its way up into CPython.
00:23:00.000 --> 00:23:08.660
And there shouldn't technically be any reason why we can't just piggyback off of that and just go, oh, well, they've already cached this or use a similar technique of basically,
00:23:08.660 --> 00:23:14.600
if the object hasn't changed, I really don't need to worry about previous versions of this being different.
00:23:14.600 --> 00:23:19.540
So I can just cache it and reuse it and just save myself the hassle of having to get a method back.
00:23:19.540 --> 00:23:21.740
Or same thing with built-ins, right?
00:23:21.740 --> 00:23:26.480
Like if you ever want to call len, some people cache it locally for performance.
00:23:26.480 --> 00:23:35.200
But the work that's going on is actually going to make that a moot point because it's going to start to notice when the built-ins and the globals for your code have not changed.
00:23:35.200 --> 00:23:39.300
And just go, well, I've already cached len locally because I already know I've used it previously.
00:23:39.300 --> 00:23:53.560
So I might as well just pull that object immediately out of my cache instead of trying it in the local namespace, not having it there, going to the global namespace, not having it there, then going to the built-in namespace and having to pull out len again for every time through a loop, for instance, and call that.
00:23:53.560 --> 00:23:54.860
Yeah, that's really great.
00:23:54.860 --> 00:23:58.780
And I suspect you could just say, here's the JIT compiled machine instructions.
00:23:58.780 --> 00:24:00.900
Just cache that or something like this.
00:24:00.940 --> 00:24:02.140
Yeah, exactly.
00:24:02.140 --> 00:24:10.960
So a lot of this work that's happening directly in CPython bubbles down both directions into helping JITs in various ways, right?
00:24:10.960 --> 00:24:15.980
Like this whole detecting what state a namespace is from the last time you looked at it.
00:24:15.980 --> 00:24:17.140
Has it changed at all or not?
00:24:17.140 --> 00:24:20.960
That's probably going to end up in CPython itself as an implementation detail.
00:24:20.960 --> 00:24:25.640
But it also means all the JITs will be able to go, oh, look, the built-in namespace hasn't changed.
00:24:25.640 --> 00:24:28.680
So that means if I've cached len, I don't need to worry about it being changed.