forked from mikeckennedy/talk-python-transcripts
-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy path044-jupyter.txt
1328 lines (664 loc) · 58.8 KB
/
044-jupyter.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
00:00:00 One of the fastest growing areas in Python is scientific computing.
00:00:03 In scientific computing with Python, there are a few key packages that make it really special.
00:00:07 These include NumPy, SciPy, and the related packages.
00:00:11 But the one that brings it all together, visually, is IPython, now known as Project Jupyter.
00:00:16 And that's the topic of episode 44 of Talk Python to Me.
00:00:20 You'll learn about the big split, plans for the recent $6 million in funding,
00:00:25 Jupyter at CERN and Large Hadron Collider with Min Arkay and Matthias Boutonnier.
00:00:54 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.
00:01:01 This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy.
00:01:05 Keep up with the show and listen to past episodes at talkpython.fm.
00:01:09 And follow the show on Twitter via at Talk Python.
00:01:11 This episode is brought to you by Hired and SnapCI.
00:01:15 Thank them for supporting the show on Twitter via at Hired underscore HQ and at Snap underscore CI.
00:01:23 Hi, folks. No news this week.
00:01:25 I do have a big announcement coming, and I'm really looking forward to sharing it with you all,
00:01:29 but I'm not quite ready to talk about it yet, so stay tuned.
00:01:31 For now, let's get right to the interview with the Project Jupyter core devs,
00:01:36 Min Arkay and Matthias Boutonnier.
00:01:38 Matthias, Min, welcome to the show.
00:01:40 Thanks.
00:01:41 Thanks, Mike, for having us here.
00:01:43 Yeah, I'm really super excited to talk about Python intersected with science in this thing called
00:01:49 IPython, or what's become Project Jupyter.
00:01:51 So that's going to be really great.
00:01:53 And before we get to that, though, let's just talk about how you got involved.
00:01:58 How do you get into programming?
00:01:59 How do you get involved with IPython and all that stuff?
00:02:03 What's your background?
00:02:03 Min, you want to go first?
00:02:04 Sure.
00:02:05 Yeah, so I was an undergrad in physics at Santa Clara University, working with Brian Granger,
00:02:11 one of the founders of the IPython project.
00:02:13 And I was interested in computing and simulation and things, and ended up working on the interactive
00:02:21 parallel computing part of IPython as my undergrad thesis, and started doing my numerical simulation
00:02:28 homework stuff in Python, even though the classes were taught in MATLAB and Octave and things,
00:02:34 and enjoying the scientific Python ecosystem of NumPy and Matplotlib and things.
00:02:39 And that's kind of how I came to the project and scientific Python in general.
00:02:44 Yeah, that's really cool.
00:02:45 And was IPython already a thing when you got started?
00:02:48 Yeah.
00:02:48 Fernando created IPython in 2001, and I was doing my undergrad a few years after that.
00:02:54 And so I joined the project after it had been around for about five years in 2006, and I've
00:02:59 been working on it for the past 10 years, I guess, now.
00:03:02 Yeah, about 10 years.
00:03:03 So how time flies.
00:03:04 Matthias, how about you?
00:03:05 Oh, so I've been to the project much later than Min.
00:03:11 I actually started programming a long time ago and came across one of the huge refactoring
00:03:16 of IPython, MinDead.
00:03:18 I think it finished in summer 2011.
00:03:22 Just after that, they released the Qt console, so the current IPython team at the time.
00:03:27 And the project was much more friendly and in a good shape for beginners Python programmers.
00:03:35 At the time, I was beginning my PhD in biophysics in Paris, and I started contributing to the project.
00:03:42 It was my first big contribution to an open source project.
00:03:45 And I started to spend my night and weekend doing my PhD, improving IPython, which was helping
00:03:52 me really a lot for my PhD.
00:03:55 And I quickly became a core contributor, and I've stayed in the team since then.
00:04:01 Maybe a good place to start talking about this whole project is maybe we can start with the
00:04:07 history.
00:04:07 Originally, this project was called IPython and IPython Notebooks, right?
00:04:12 Yeah, IPython was around for a good 10 years before we got a version of the notebook out, although
00:04:19 we had been working on various versions of notebooks for about five years that most attempts kind
00:04:24 of didn't go anywhere.
00:04:25 So what did it look like?
00:04:27 What did the product look like in the early days?
00:04:31 Yeah, so initially, Fernando created IPython as just a better interactive shell for Python,
00:04:38 so giving you some better tab completions, nice colorful tracebacks, things like that.
00:04:44 Also, Python's a nice, verbose language, but when you're doing interactive stuff, some of
00:04:50 the bash shell syntax is nicer to type when you're doing LS and CD and everything.
00:04:55 So one of the things that Fernando did early on was add this notion of magics for extending
00:05:01 the Python language to give convenient commands for interactively typed things.
00:05:06 like you can type CD in IPython, which you can't type in a regular Python environment.
00:05:13 And then there are magics that are particularly useful for the scientific visualization things
00:05:20 and things like the time at magic for profiling and good map.lib integration for the event loop
00:05:27 and things like that.
00:05:29 I know what the shell looks like today.
00:05:31 You know, you can load it up and it kind of looks like a shell and you type in there,
00:05:35 but you can do things like plot graphs and those will pop up into separate windows and things like that, right?
00:05:41 Yeah, and most of that is provided by tools like MapPotlib, but often those tools need a little bit of help to make sure that the terminal stays responsive.
00:05:50 And that's one of the things that IPython helps with in terms of what Python calls the input hook
00:05:58 to ensure that the terminal remains responsive while a GUI event loop is also running.
00:06:04 Right, yeah, very cool.
00:06:05 So how did it go from that to the more, I want to think of it as like articles or published style,
00:06:12 these notebooks that you can use to communicate almost like finished work rather than something interactive?
00:06:18 Notebook style interfaces, since a lot of the IPython folks come from a physics background.
00:06:24 So Brian and Fernando, who started the project, were doing their graduate work in physics at Boulder at the same time.
00:06:32 And then I was Brian's physics student.
00:06:34 And notebook environments are pretty common.
00:06:37 There are various commercial and non-commercial products that have kind of notebook processing environments,
00:06:43 especially for math analysis.
00:06:47 So often code is not the best representation of math, but there are rich, you know, rendered mathematical expressions that are nice.
00:06:54 Brian and Fernando knew that they wanted a notebook type interface fairly early on,
00:07:00 but the tools just weren't there to build it.
00:07:04 And IPython wasn't in a shape to really support it.
00:07:07 So slowly at first, and then it kind of picked up speed.
00:07:10 We added the pieces for putting that together.
00:07:14 But it was kind of in, it was on the horizon for many years before it actually happened.
00:07:20 Sure.
00:07:20 It took a while to build the maturity into it.
00:07:23 What are some of those building blocks that it was waiting on?
00:07:27 Yeah, I think that web technology and web socket were one of the technologies that was missing to actually use a notebook.
00:07:36 If I remember correctly, one of the latest prototypes that we did not release was using AJAX polling.
00:07:45 But the ability to actually push a result to the web front end once, as soon as the kernel gets a result,
00:07:53 is one of the key factors that pushed the notebook forward and allowed us to do the notebook.
00:07:59 Actually, the notebook that you know nowadays, the first prototype, was actually using still draft of web sockets that stayed in draft state a long time.
00:08:08 And so we were really bleeding edge on this technology and adopting a lot of everything, everything in browser,
00:08:15 and everything to just rely on what current browser can do for the notebook.
00:08:19 There is, by the way, we can put that in the note of the podcast, a really nice blog of Fernando that recaps the history of IPython.
00:08:27 And even 150 lines of Python, which is a version of IPython when it was like a few weeks old, which is IPython 0.1,
00:08:38 that we can dig up for people who are interested in trying really early prototype.
00:08:43 Yeah, go back and see the history.
00:08:45 That's a really interesting point, Matthias, because it's easy to think of the web as being this very rich, powerful, capable platform,
00:08:54 because it has been for the last five years or so.
00:08:58 But 10 or even more than 10 years ago, it was not, right?
00:09:03 It was basically just documents on the web, right?
00:09:07 You had a little bit of JavaScript, and that was about it, right?
00:09:10 Yeah, I think so.
00:09:11 I haven't used that much.
00:09:13 I was not developing on the web that much 10 years ago.
00:09:17 I was more a C, C++ person.
00:09:20 You mean maybe it was more developing web at the time?
00:09:23 Yeah, I wrote an early version of the web-based notebook for IPython during the summer of 2006, 2007, I think.
00:09:33 Even then, the tools available really weren't...
00:09:37 It was not a particularly pleasant thing to work with.
00:09:42 I bet it wasn't.
00:09:43 Did you end up in a lot of situations where you're like, oh, this only works in Firefox, and this one only works in IE, and just partly working in a lot of places?
00:09:52 It's frankly still like that.
00:09:54 It still never works in IE.
00:09:56 Yeah.
00:09:58 Yeah, it's hard to love IE, I know.
00:10:02 Well, but I mean, recent versions of Internet Explorer are actually really nice and have good standards implementations and everything, but the reputation of IE6 kind of overshadows.
00:10:16 Yeah, it definitely casts a long shadow.
00:10:18 And, you know, Microsoft, I think just last week, possibly, like very recently, just ended support for all versions of IE other than, I think, in 11 and onward.
00:10:30 Maybe 10 and onward, but certainly knocked out a whole bunch of them.
00:10:33 And, you know, once that kicks in, that's going to be a good day for everyone that has to work on the web.
00:10:38 Microsoft has done a lot of things recently.
00:10:41 Last week also, if I remember correctly, they did release as open source the JavaScript engine that will power the next version of their browser.
00:10:51 So Google has V8, which both power Chrome and Node.js, which is actually one of the technologies that helped the notebook become reality because JavaScript was painfully slow 10 years ago and is now really, really fast thanks to V8.
00:11:10 And so it's really nice to see nowadays Microsoft actually releasing open source software and contributing to the community.
00:11:20 And I hope that in the next few years, Microsoft will lose some fact that everybody is complaining about IE and everything and get actually nice software, not that many security bugs and so on and so forth.
00:11:35 It will be really nice if that comes along because a lot of people run their software and it would be, you know, the world would be a better place if it works really well.
00:11:44 I certainly think they're on the right path.
00:11:46 I think it's pretty interesting.
00:11:48 So one thought I had while you guys were talking about this is how does, what's the cross-platform story or IPython and Jupyter in general?
00:11:57 Does it work kind of equally well on Windows, Linux, OS X or are there places that are more equal than others?
00:12:05 Linux and OS X are a little bit more equal than Windows, but it should work.
00:12:12 It should work everywhere.
00:12:13 And even though all of our developers and everything are working exclusively on Linux and OS X, when we do user surveys and things, we find that roughly half or even slightly more than half of our users are running Windows.
00:12:28 So even though it often doesn't work quite as well or we frequently during the development process will introduce bugs that we don't notice for a while, Windows really is a first-class platform for the kind of local desktop app that happens to use a web browser for UI case of the notebook.
00:12:46 There are certain aspects of installation that are often more challenging on Windows, especially in terms of installing kernels other than the Python one.
00:12:55 So installing multi-language kernels is more challenging on Windows.
00:13:01 And I think that's not necessarily a specific deficiency of Windows.
00:13:05 It's more just the kind of developer maintainers don't tend to use Windows.
00:13:10 So the documentation and education often just don't cover what you need to do for Windows as well.
00:13:15 Right.
00:13:16 If you don't develop and test deploying your packages in the underlying compilers that have to make them go, well, you're more likely to run into problems, right?
00:13:25 Yeah.
00:13:26 I would say also that Trevis CI, so continuous integration, is often on Linux only.
00:13:31 Setting up on Windows is painful.
00:13:33 So we catch up bugs with continuous integration, much often with continuous integration on Linux.
00:13:41 So less prone to bug on Linux.
00:13:44 And the other thing is, I don't always like to say good things about half proprietary tools, but Konda changed a lot of things for the last few years.
00:13:55 It was really painful to install Python on many systems.
00:13:59 And now it's one of the solutions, especially at Software Carpentry Bootcamp, where we ask people to just install Konda and Konda install Jupyter, which now even come vended in it.
00:14:10 And it's almost always works out of the box.
00:14:14 And especially for beginners, it's a really, really nice tool.
00:14:19 Yeah, Konda has really moved the bar for how easy it is to get set up, especially on Windows.
00:14:26 There are lots of different ways to install things on Unix-y platforms that work fairly reliably.
00:14:32 But the binaries provided by Konda and Anaconda are extremely valuable for beginners, especially on Windows, where people don't tend to have a working compiler set up.
00:14:42 And a lot of the scientific packages won't build on people's Windows machines.
00:14:48 So having binaries is extremely important.
00:14:51 And the binaries provided by Konda and Anaconda have been extremely valuable, especially for people getting started in scientific Python.
00:14:58 Yeah, I still think I have scars from the vcvars.bat was not found sort of errors trying to do stuff on Windows.
00:15:07 And we had Travis Oliphant on Show 34, who is behind Konda and Continuum and all that.
00:15:14 And I think it's a really cool thing that those guys are doing, sort of taking that build configuration step and just pre-building it and shipping the binaries, like you say.
00:15:26 That really helps people when they're getting started, I think.
00:15:29 Yeah, it's made a huge difference, especially, as Matthias mentioned, in the workshop, the kind of software carpentry and Python boot camp type environments, which often, you know, just a few years ago, where you spend the first day on installation, basically.
00:15:48 Which is a high price to pay in a two-day workshop.
00:15:51 And now it's often down to an hour.
00:15:53 It's awesome.
00:15:54 It's a super high price to pay.
00:15:55 And it's also super discouraging, right?
00:15:58 People come not because they want to learn how to configure their compiler.
00:16:01 They want to come build something amazing, right?
00:16:04 And they've got to, like, plow through all these nasty configuration edge cases.
00:16:08 And, yeah, very, very cool.
00:16:09 So, before we move farther, you know, just the other day, I was trying to describe IPython as somebody in, like, one or two sentences.
00:16:19 And I didn't do a super job, I think.
00:16:21 Could you guys maybe give me your elevator pitch for what is Jupyter or IPython, which becomes Jupyter?
00:16:29 It's really tough.
00:16:30 Have you seen the Lego movie?
00:16:32 Do you know the song Everything is Awesome?
00:16:35 Yes.
00:16:37 That would be my pitch.
00:16:39 Yeah.
00:16:42 Everything is awesome.
00:16:43 Okay.
00:16:43 Yeah.
00:16:44 So, I would say it's IPython and Jupyter projects together provide tools for interactive computing and reproducible research and software-based communication.
00:16:58 Okay.
00:16:59 It's kind of the high-level gist.
00:17:01 It's fairly different than a lot of what's out there from a programmer's perspective.
00:17:06 So, it does take a little explaining, doesn't it?
00:17:08 Yeah.
00:17:10 So, we have things like an environment in which to do the interactive programming and do the exploratory work.
00:17:16 And then we also have things like the notebook document format, which are for distributing the communication and sharing it with other people.
00:17:23 So, those are kind of the two aspects.
00:17:26 So, those are kind of the two aspects.
00:17:26 And Fernando likes to say we have tools for the life cycle of a computational idea.
00:17:31 That's a very cool way to put it.
00:17:33 It's a very cool tagline.
00:17:34 I like it.
00:17:34 We're talking about IPython because that's the historical place.
00:17:39 And we're talking about Jupyter because that's the present and the future.
00:17:42 Could you guys maybe talk about how it went from one to the other?
00:17:46 What's the story there?
00:17:47 Yeah.
00:17:47 So, when we started working on building these UIs with rich media displays, the first one of which was the Qt console, the first step of that was separating the front end from what we call the kernel, which is where code runs.
00:18:04 That meant essentially establishing a network protocol for a REPL, basically.
00:18:10 And with that, we have the ability, an expression of, okay, I'm going to send an execute request that has some code for the kernel to evaluate.
00:18:19 And then the kernel sends messages back that are display formats of various types.
00:18:25 So, it can send back PNGs or HTML or text.
00:18:29 We realized, not entirely on purpose, this wasn't what we set out to do, but we realized when we had this protocol that there was nothing Python-specific about it, that any language that understands a REPL can talk this protocol.
00:18:44 And because the UI and the code execution were in different processes, there's no reason that the two need to be in the same language.
00:18:53 Communities like, the first big one was the Julia language community, essentially saw the UI, specifically the notebook UI, and said, you know, we like that, we want to use that, we'd rather not reimplement it.
00:19:07 So, what they implemented was the protocol.
00:19:09 And once they implemented the protocol, they got the UI for free.
00:19:13 The result of that, since we didn't set out to design that, there were a bunch of rough edges where we had assumed Python, but they were kind of incidental, smaller assumptions to work around.
00:19:25 And so, since that started, we've been kind of refining protocols and things to remove Python and IPython assumptions, so that the UI is separate from the language in which execution happens.
00:19:40 Because we don't really, you know, a lot of the benefits of the protocol and the display stuff, there's no reason it should be confined to code executing in Python.
00:19:49 Yeah, that's a really happy coincidence, isn't it?
00:19:53 That's excellent.
00:19:54 Yeah.
00:20:05 This episode is brought to you by Hired.
00:20:07 Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.
00:20:13 Each offer you receive has salary and equity presented right up front, and you can view the offers to accept or reject them before you even talk to the company.
00:20:20 Typically, candidates receive five or more offers within the first week, and there are no obligations, ever.
00:20:26 Sounds awesome, doesn't it?
00:20:28 Well, did I mention the signing bonus?
00:20:30 Everyone who accepts a job from Hired gets a $1,000 signing bonus.
00:20:33 And as Talk Python listeners, it gets way sweeter.
00:20:35 Use the link Hired.com slash Talk Python To Me, and Hired will double the signing bonus to $2,000.
00:20:41 Opportunity's knocking.
00:20:43 Visit Hired.com slash Talk Python To Me and answer the call.
00:20:53 Matthias, where did Jupyter come from?
00:20:55 It used to be called IPython.
00:20:57 Obviously, that doesn't make sense if you're not using Python.
00:21:00 We've been thinking about renaming part of the project for much longer times than when we actually announced that we will be renaming to Jupyter.
00:21:11 Of course, we were aware that users, especially non-Python users, were confused.
00:21:16 Like, I want to use a notebook with R.
00:21:20 Why should I install IPython?
00:21:22 And you have to understand that many users even don't make the difference between Python and IPython.
00:21:30 And many users also write IPython with lowercase i, and everyone knows that it's uppercase i.
00:21:34 It's not made by Apple.
00:21:37 Come on.
00:21:37 Yeah.
00:21:38 And so, yeah, we were searching for another name to actually something that is easy to Google, that is not already taken, where we can get a domain name.
00:21:50 And that would have a connotation, a scientific connotation.
00:21:54 And we wanted to do thanks to the Astro community that has been using IPython for a long, long time, almost since the beginning.
00:22:04 And I still remember one day, Fernando wrote a mail to the whole team and said, hey, I just found this name.
00:22:12 What do you think?
00:22:14 And it was everybody agreed.
00:22:16 And almost in a couple of days, we decided to grab all the domain name and start working on actually separating the project and everything.
00:22:26 It has been a really tough transition.
00:22:29 People were really, really confused about the renaming.
00:22:33 People are still confused.
00:22:35 But especially for new users, the distinction Jupyter-IPython is really, really useful.
00:22:42 And also, it allowed Jupyter to become something slightly bigger.
00:22:47 That was also in our mind in the back.
00:22:49 Which was like, Jupyter is more a specification that you have a protocol and you have a set of tools.
00:22:56 What is part of Jupyter is much broader and it can allow anybody to basically say, hey, I implement the Jupyter protocol.
00:23:05 And so, it's easier to say, hey, I have a Jupyter Atom plugin.
00:23:11 There are also legal issues around that, that using trademarks that are really close to Python is difficult.
00:23:17 And Jupyter, being a brand new namespace, and you know that namespaces are great, we should use more of them.
00:23:24 Allow people to use that and say that they are multi-language in a much better way than when saying we are compatible with IPython.
00:23:34 Because IPython is also highly connected as a shell.
00:23:38 And Jupyter is more than just a notebook.
00:23:40 So, having Jupyter is much better and we are happy with that.
00:23:46 Yeah, it makes perfect sense.
00:23:48 I'm sure the transition was a little confusing for people who have been doing IPython or they've heard about IPython.
00:23:54 They were going to look into it.
00:23:55 Now it's this other thing.
00:23:57 But there's more than just a couple of languages that are supported, right?
00:24:01 How many are supported?
00:24:01 It depends on when you want to be supported.
00:24:06 We have a wiki page which is still on the IPython repository, which lists, if I remember correctly, 50 or almost 60 languages.
00:24:17 It means that you can have languages that have many kernels.
00:24:22 It means that someone at some point wrote a kernel or a toy kernel that works with IPython.
00:24:29 And if I remember correctly, we have around 60.
00:24:32 60.
00:24:34 That means probably if you have a language you care about, it probably works with Jupyter, right?
00:24:40 Or it's very edge.
00:24:41 Most kernels won't have all the features.
00:24:44 I would say that the one I know works with most of the features are the Python one because we maintain it.
00:24:53 So you can see it as a reference implementation.
00:24:55 There are other Python ones, like toys that are only a few hundred lines to show you how to implement that.
00:25:02 The Julia kernel is a pretty feature complete.
00:25:07 It's actually many of the features that we have in the IPython kernel were actually, by the Julia team, moved into the Julia language itself.
00:25:18 So actually having implemented the protocol, having seen the Nullbook UI, allowed them to make much better abstraction for the Julia language and actually improve performance in some small area.
00:25:30 So that's almost a thing.
00:25:32 The Haskell kernel also have a really good maintainer and have really nice features.
00:25:39 Like if you write some code in Haskell and you can rewrite it in a more compact form, the Haskell kernel will tell you that after running your code.
00:25:47 They say, hey, you can rewrite it this way.
00:25:49 It will be more compact and more readable by someone who does Haskell.
00:25:53 So Ruby had some activity at some point.
00:25:57 I'm not sure now how much activity there is.
00:26:00 And we definitively have people from the R kernel.
00:26:03 The R kernel was created by Thomas Clover, who is now back in the UK, still working with us.
00:26:08 And I've been taken over by some R people who are actively contributing to the R kernel and are also reporting bugs and fixing bugs a lot in IPython itself.
00:26:20 Yeah, and another active kernel author community is the Calico project, which is from the CS department of Bryn Mawr by Doug Blank, where it's kind of a multi-language.
00:26:33 The kernel itself is a multi-language environment.
00:26:37 They can actually switch between different runtimes.
00:26:39 That does some pretty cool stuff.
00:26:41 And they've been very helpful with implementation and protocol testing and things.
00:26:46 Oh, that's cool.
00:26:47 Yeah, and that's kind of related to what I was going to ask you next is if I want to write something in Python and then something in C++ and then something in R, can I do that in like one notebook and have the data work together?
00:26:58 Well, yes.
00:27:01 So there are a couple things to that.
00:27:03 One is we have chosen in the notebook to associate one notebook with one kernel.
00:27:10 So there's one process determining how to interpret the code cells and produce output as a result.
00:27:16 There's another project derived from IPython called Beaker notebook that doesn't do this, that associates each cell with a kernel and then defines a data interchange for moving data around that allows running code and passing data around from JavaScript to R, Python, and like this.
00:27:34 However, a kernel, from Jupyter perspective, a kernel can itself define semantics for running code in other languages.
00:27:43 And IPython, this is where sort of the distinction between IPython and Jupyter comes up.
00:27:49 That as far as Jupyter is concerned, there's one kernel associated with the notebook.
00:27:53 But the IPython kernel can define these things called cell magics that say this is shorthand for actually compiling a block of C++ code with Cython and then running that, or the R magic that actually hands off code to an R interpreter.
00:28:09 As far as Jupyter is concerned, there's only one kernel per notebook, but the kernels themselves can actually provide some of this multi-language functionality.
00:28:17 And IPython does.
00:28:18 Yeah, to extend on what Min said, there is another kernel, one of the Calico kernels, actually, which actually is one kernel that implements many languages, won't go exactly into details.
00:28:31 And there is this nice distinction that a kernel is not always one language, it can be many languages, and in particular the Calico kernel uses triple person syntax to say,
00:28:41 hey kernel, change how you parse the next string.
00:28:46 And so you can actually switch in between three or four languages.
00:28:50 I don't exactly remember.
00:28:51 There is Python, there is Scheme, and something like that.
00:28:55 And it's really interesting.
00:28:57 And you have different ways of actually sharing data between languages.
00:29:01 One of the examples I can try to dig, which is really interesting in multi-language integration,
00:29:07 is the Julia Python magics demonstration.
00:29:12 You can actually move data and actually have a tight integration between Julia and Python.
00:29:19 It's only on Python 2, unfortunately.
00:29:22 We need to update the code for Python 3, but I'm not fluent enough in Julia.
00:29:26 You can define the Fibonacci function recursively in Julia, calling Fibonacci n-1 in Python and Fibonacci n-2 in Julia.
00:29:35 And the Python versions that call Fibonacci n-1 in Julia and Fibonacci n-2 in Python.
00:29:41 And you can ask for a number, and you will actually get a cross-language talk or a stack trace where you have a layer cake of each language,
00:29:51 which is really, really impressive.
00:29:53 You can even create from Julia, you can import Matplotlib from the Python side,
00:30:01 create a figure in Julia with a function which takes sine, for example, from the Julia standard library,
00:30:09 and cosine from the Python standard library on MPI, and plot that on the Matplotlib figure and get back into Python
00:30:17 and annotates the figure from the Python side without copying the figures.
00:30:21 It means that the two interpreters are actually sharing memory.
00:30:24 So it shows you that you can do some really, really advanced cross-language integration
00:30:31 without having to copy data back and forth.
00:30:34 That sounds really interesting and useful for scientists.
00:30:39 You know, maybe they've got something they've done in R or some other language.
00:30:43 Yeah.
00:30:44 Some little bit of processing, and they're like, I really just want to plug this in over here, but it's the wrong language, right?
00:30:50 And that sounds like it makes it kind of possible.
00:30:52 Yeah, and it's been, so Julia being a very young language community, it's been extremely valuable to them to build this bridge,
00:31:01 largely to Python, but also to C and things.
00:31:05 That Julia didn't, in order to come up to speed with what scientific programmers expect,
00:31:11 with things like Matplotlib and stuff, they didn't need to write, okay, here's the Julia plotting library, just so that people could do anything.
00:31:18 They could start out by saying, well, we'll just use Python libraries.
00:31:22 And because of these really, really slick layers that let Julia talk to Python
00:31:28 and Python talk to Julia in a really native way, Julia basically gets the entire Python library ecosystem for free,
00:31:35 and then can kind of re-implement as needed.
00:31:39 And as they find more idiomatic Julia ways to do things, they can start building those libraries.
00:31:44 But they didn't have to start from zero just because it was a new language.
00:31:48 For new languages in general, I think being able to interoperate with other languages
00:31:52 is a really important, really valuable way to start kind of hit the ground running.
00:31:57 That is super interesting because, you know, there's 10, 15 years of super solid data science stuff happening in Python.
00:32:04 And if you can just go, we'll just start from there, rather than from zero, that makes all the difference.
00:32:09 The other cross-language thing that a few people think about is actually the language in the kernel
00:32:15 and the language of the front end, which is JavaScript and HTML.
00:32:19 And the notebook allows you to do like these cross-language bindings really easily, especially with widgets.
00:32:25 And one of the examples I give is you can have interactive 4chan.
00:32:30 You can actually plot something in 4chan and display something in JavaScript
00:32:34 and have a slider that you move and change the result of your 4chan computation.
00:32:38 Wow, that's really awesome.
00:32:40 One of the things I wanted to spend a few minutes talking with you guys about was what you referred to as the big split.
00:32:46 So it used to be that IPython was like one giant GitHub repository, right?
00:32:51 And now you've broken it into many smaller pieces, yeah?
00:32:54 Yeah, I went from one to about a dozen.
00:32:57 That sounds like a lot of work.
00:32:58 Yeah, that was my spring, pretty much.
00:33:02 It took three to six months to kind of get it all split up.
00:33:09 But since we knew we eventually wanted to do something like that, we had, for the most part, already organized IPython into kind of sub-packages of dedicated functionality.
00:33:20 So it wasn't too crazy difficult to break it up.
00:33:26 But the tricky bits were like the common utilities that we use all over the place and how to deal with that.
00:33:31 Yeah, it's the little interdependencies that don't seem big, but when they're woven in between all the pieces, all of a sudden it gets harder and harder, right?
00:33:40 Yeah.
00:33:41 So it was tricky to execute the big split in a variety of ways.
00:33:46 So we wanted to preserve history, but we also didn't want to duplicate the already large IPython repo so that installing from Git would mean that it now is 12 times as big as getting IPython was.
00:34:00 So we had to do some, on the new repos, some clever Git history rewriting to kind of prune out the history for the files that didn't survive.
00:34:10 That's really interesting.
00:34:12 You talked about that in your blog called The Big Split, which I'll link to in the show notes.
00:34:17 And you had to use some funky commands to sort of make that happen, right?
00:34:21 Be really careful about how you move the history over.
00:34:23 Yeah, and this is what, there are some, I think, in the various version control tools, the fact that you can rewrite history in Git is both really scary and weird and gross, but also really useful sometimes.
00:34:35 But it lets you do things like we did, which is selectively preserve history, which has been nice that you get, you know, you get the history of the notebook work in the notebook repo, even past the creation of the notebook repo.
00:34:50 But you don't get the baggage, right?
00:34:52 Yeah, but we don't get the history of all the rest of IPython.
00:34:56 Yeah, because normally you delete a file out of the repo and it's just, it doesn't show up, but you're still moving it around, right?
00:35:01 To give you an idea, I will just share, I will send you a link.
00:35:06 Someone made a graph visualization of the dependencies into IPython before the Big Split.
00:35:14 And on the same blog post, you have comparison with Django, Twisted, Flask, requests, so that you can get an idea of what the complexity of the entanglement was.
00:35:26 It's on grokcode.com and it's blog post 864.
00:35:30 And I will give you the link so you can put it in the notes.
00:35:34 Oh yeah, thanks.
00:35:35 It's pretty seriously entangled.
00:35:37 There aren't dependency cycles and crazy loops of depending on each other.
00:35:44 It's kind of a tree of dependencies, but there are many nodes on the graph.
00:35:50 Sure, sure.
00:35:51 SnapCI is a continuous delivery tool from ThoughtWorks that lets you reliably test and deploy your code through multi-stage pipelines in the cloud without the hassle of managing hardware.
00:36:16 Automate and visualize your deployments with ease and make pushing to production an effortless item on your to-do list.
00:36:22 Snap also supports Docker and M-browser debugging, and they integrate with AWS and Heroku.
00:36:28 Thanks SnapCI for sponsoring this episode by trying them with no obligation for 30 days by going to snap.ci slash talkpython.
00:36:45 Another thing I wanted to talk about with you guys is this thing called JupyterHub.
00:36:48 What's the story of the JupyterHub?
00:36:50 So JupyterHub came out of the value of Jupyter Notebooks and things in teaching context.
00:36:57 So a lot of people, whether in workshops or classes, are using notebooks to present.
00:37:04 This is the material that we're talking about and the example code and running, you know, doing live demos and things.
00:37:10 You want your students to be able to follow along, and this is one of those cases where, you know, installing scientific Python stack is as much as Conda has made that easier.
00:37:20 It still can be, you know, a significant bar to get over.
00:37:24 So we wanted, and people were building kind of, people were building tools for kind of hacks around IPython at the time to deploy notebooks on behalf of users.
00:37:37 And we wanted to provide kind of an official implementation of hosting notebooks on behalf of a group of users in the context of a research group.
00:37:49 So you've got a machine that a bunch of, you know, half a dozen or so research scientists or students have access to,
00:37:55 or you have a class of 10 or 50 or 100 students.
00:38:00 And you say, all right, I've got these users.
00:38:03 I can install packages for them and then point them at this URL and they can log in and run their notebook.
00:38:10 So basically take away the installation problem by saying, I'm going to control the installation and host the notebooks and everything.
00:38:21 And we wanted to kind of create the simplest, smallest version of that.
00:38:24 And that's JupyterHub.
00:38:25 Okay, nice.
00:38:26 From the technical side, we asked before how, which technology were necessary for the notebook.
00:38:32 And we spoke about WebSockets.
00:38:34 One of the things which is important is we were using really recent technology and WebSockets was really new at the time.
00:38:44 And one of the problems is many proxies or web servers were unable to correctly redirect WebSockets or even to update proxy rules without actually restarting the proxy or server.
00:39:01 And that's one of the requirements we had for the notebook.
00:39:04 If you want to spawn someone notebook without cutting the connection of the other, we had to have a dynamic proxy, which was able to respond to, for example, REST requests, like changes through there without dropping the WebSockets.
00:39:19 And before JupyterHub, only a handful of prototypes of projects were able to do that.
00:39:24 And actually, Min wrote one such an HTTP proxy using Node.js to suit this specific need that no other tools require.
00:39:37 And that's why you actually need JupyterHub to run something.
00:39:40 And JupyterHub needs to be one of the most front-facing software.
00:39:43 I think that no Nginx can do it, too.
00:39:45 You cannot, for example, use Apache or something else.
00:39:50 Or it's really much more difficult to have many notebook servers running.
00:39:54 It's interesting that a lot of the web servers weren't really built for that, right?
00:39:59 Because I guess they came before WebSockets anyway.
00:40:03 And that probably was not a super important criteria for them, right?
00:40:06 Yeah, the web servers were actually surprisingly slow to adopt, to provide WebSocket implementations.
00:40:13 Nginx didn't take too long, but it was quite a while before you could reasonably expect an Apache installation to support WebSockets.
00:40:21 They do now, so there are notebook deployments behind both Apache and Nginx.
00:40:26 But, yeah, we put together the configurable HTTP proxy as this kind of super simple proxy that you can update.
00:40:34 You can update the routing table without relaunching or without losing existing connections or anything.
00:40:41 The more you can make it easier for people to set up these environments, the better.
00:40:45 Because it's not always going to be like some web server admin or a really experienced web developer doing this, right?
00:40:52 It could just be a scientist who just wants this thing for their class, right?
00:40:56 They don't want to deal with Nginx.
00:40:57 Yeah, and that's something we're working on right now.
00:41:00 Because the way JupyterHub is put together, it has two primary extension points.
00:41:05 One is authentication.
00:41:07 So how users log in, and you can kind of drop in any implementation of logging in, whether it's just local authentication with the system, with the password, or using OAuth with GitHub, or your campus sign-on, that kind of stuff.
00:41:22 And then the other is the spawning, how it actually allocates resources for the single-user servers.
00:41:28 But because there are so many choices for how to do that, it actually, one thing we're working on is making kind of more of a turnkey version that you can say,
00:41:40 I want to use this authentication system and this spawning mechanism, and people can just deploy that.
00:41:46 Because there are, from the very simple default behavior of just that works out of the box for, I've just got a shared machine that's on the internet,
00:41:55 and I want to give all the users who already have accounts on that machine access.
00:41:59 That's pretty trivial right now, all the way to a deployment last year that Jess Hamrick did at UC Berkeley
00:42:08 for a couple hundred students in psychology using Docker Swarm and Nginx and a big multi-node deployment for a large number of users
00:42:22 and using Ansible to automate all that deployment.
00:42:25 That sounds awesome.
00:42:26 Is that documented somewhere?
00:42:28 Like, is there an article or something on this?
00:42:30 Yeah.
00:42:30 So she wrote a blog post for the Rackspace developer blog because the hosting for that class was all provided by Rackspace.
00:42:37 So she wrote a blog post covering that, and her Ansible setup is just a repo on GitHub that we can link to.
00:42:43 Okay, awesome.
00:42:44 Yeah, we'll put that in the show notes.
00:42:45 So I can go to Dropbox and get, like, storage as a service.
00:42:50 I can go to Google Apps and get Word Processor as a service.
00:42:54 Can I do that for Jupyter somewhere?
00:42:57 Can I just go and, like, pay $5 a month and get, like, Jupyter?
00:43:02 Yeah, there are a few companies hosting Jupyter Notebooks.
00:43:06 So IBM has their, I believe it's called a workbench, data science workbench.
00:43:11 Continuum Analytics has Wokari, which hosts Notebooks.
00:43:17 I'm trying to think how many others there are.
00:43:20 There's Domino Data Lab.
00:43:23 So there's William Stein with SageMath.
00:43:26 Yeah, SageMath Cloud is probably the primary, the one we're most connected to.
00:43:31 Okay, cool.
00:43:32 That's good to hear.
00:43:34 Yeah, so there are a variety of these hosted notebook things.
00:43:38 Yeah, one thing which is slightly related, I don't know if we might talk to that later.
00:43:44 It's mybinder.org, which has been set up by Jeremy Freeman from Janelia Labs, where basically you set up a GitHub repository, your notebooks, a requirement file, some extra metadata if needed.
00:44:01 And you link to mindbinder.org.
00:44:03 I will give you the link.
00:44:05 And it will actually, just for you, spawn a Docker instance with the requirements and give you a temporary notebook online.
00:44:15 So if you have an article that you want to be reproducible, you can just post it on GitHub.
00:44:20 It's basically like NB viewer, for those who know, but back by a kernel.
00:44:26 And it's paid directly out of Jeremy's pocket.
00:44:28 And huge thanks to him.