Chris Benson: Hey there. Welcome to another episode of the Practical AI Podcast. This is Chris Benson, and I'm an AI strategist. My co-host is Daniel Whitenack, a data scientist. We have a real treat in store for you today. We have a special guest that we have looked forward to having on the show for a long time now, and I'm super-excited about this episode. That guest is Bill Dally, who is the chief scientist and senior vice-president of research for NVIDIA. He is also a professor at Stanford University. Welcome very much, Bill!
Bill Dally: It's great to be here!
Chris Benson: And how's it going today, Daniel?
Daniel Whitenack: It's going great, I'm excited to talk to Bill. I'm, of course, a huge fan, as everyone is, of everything NVIDIA is doing in this space, so I'm excited to hear more.
Chris Benson: The genesis for this episode came earlier this year in March. I was at the NVIDIA GTC conference in Silicon Valley, and I got to attend a small group session called "AI for Business CxO Summit." In that, the NVIDIA CEO, Jensen Huang, was in a small group environment, and it was just an amazing amount of wisdom that I got... I was thinking, as I sat there - that was very business-oriented in a lot of ways, but I kept thinking, if we had NVIDIA's chief scientist come on board to talk us through what NVIDIA does, but give it to us as practitioners of neural network technology and other AI technology, that would just be amazing. Bill, thank you so much for coming on board. I really appreciate it.
Bill Dally: You're very welcome.
Chris Benson: I wanted to real quick ask if you could just give us a little bit of background. I mentioned that you are the chief scientist at NVIDIA, and a professor at Stanford... Could you tell us just a little bit about yourself before we launch into questions?
Bill Dally: Sure. I'm a hardware engineer who's been working on both hardware and software for AI in recent years. My first experience with neural networks was in the 1980's, when I took a course from John Hopfield at Caltech, and was building Hopfield nets, and things like that. I was on the faculty at MIT for 11 years, where I built a research group that built a number of pioneering supercomputers, I collaborated with Cray on the design of their Cray T3D and T3E, and then moved to Stanford in 1997, where I continued to lead research on high-performance computing and special purpose processors for numerous tasks, including graphics.
I first got involved with NVIDIA in 2003 when I was hired as a consultant to help with what was called internally the NV50; it became the G80 when it was announced... And particularly to help on the extensions to the G80 that enabled CUDA, the ability to run general purpose computing programs on GPUs.
[00:04:16.09] I really got to like the folks at NVIDIA, particularly Jensen, and he convinced me to join full-time in 2009. So since 2009, I've been building NVIDIA Research, the research organization at NVIDIA, and myself doing research on numerous topics... Most recently on some of the path planning algorithms for our self-driving cars, and on very efficient AI inference.
Daniel Whitenack: That's awesome. That's an amazing background. It sounds like you joined NVIDIA at a really exciting time. Of course, things have really exploded in a good way for them, and I'm sure it's a lot of excitement and thrills being at the center of that.
Bill Dally: Yes, it's a really fun place to be.
Daniel Whitenack: Awesome. From my perspective growing up, the context in which I heard about NVIDIA was in video processing and gaming, which led to the rise of the GPU... I was wondering if you could speak a little bit to how and why that transition into this very AI-oriented approach that NVIDIA is taking now, and comment on how that evolution occurred, and how you see it from your perspective.
Bill Dally: Sure. NVIDIA's roots are really in graphics; gaming is one aspect of that, but we've also always done professional graphics. And if you think about what the graphics problem is, it's basically simulating how light bounces off of the scene and it appears at your eye or at a camera, and doing that simulation, basically rendering the scene, shading each pixel is a very computationally-intensive task, and it's a very parallel task... So GPUs evolved to be very efficient parallel computers with very high computational intensity. It turns out a lot of other problems have this nature of having a lot of computational intensity and being very parallel.
Early on, probably in the early 2000's, people started trying to use GPUs for tasks other than graphics. It was a movement called GP-GPUs (general purpose GPUs), and around the same time I was leading a project at Stanford on what we called stream processors, which actually wound up developing the right set of programming tools to program GP-GPUs. We were developing in a language called Brook. The lead student on that project, Ian Buck, graduated, got his Pd.D., came to NVIDIA, and evolved along with some people at NVIDIA including John Nickolls, who was heading the computer architecture group at the time, evolved Brook into CUDA. That basically made it very easy for people to take the huge number of arithmetic units that were in GPUs and their ability to execute parallel programs very efficiently and apply them to other problems.
At first, they were applied to high-performance computing problems, and GPUs have continued to be very good at that. We currently provide the arithmetic for the number one supercomputer in the world, Summit, at Oak Ridge National Laboratories. And they've been applied to things from oil and gas reservoir modeling, to simulating more efficient combustion engines, to simulating how galaxies collide... All sorts of high-performance computing problems. Predicting weather or climate change, stuff like that - they're now done on GPUs.
So it was very natural, since we basically now had the platforms. We announced CUDA in 2006. A few years later a substantial fraction of all of the large supercomputers being built were based on GPUs... It was very natural that when other very demanding problems came along, that people would apply GPUs to them.
If you look at deep learning, and particularly the training for deep learning, it's a very computationally-intensive problem. When this first started to be done, it was taking weeks on the fastest GPUs we had... And it's very parallel, so it was a perfect match for GPUs. Early on we saw this match, and applied GPUs to that.
[00:08:10.02] For me and for and NVIDIA - the start really came when I had a breakfast with my Stanford colleague Andrew Ng, and I think it was probably in 2010 or early 2011. At the time, he was at Google Brain, and was finding cats on the internet by building very large neural networks running on 16,000 CPUs. When he described what he was doing to me, I said "You ought to be doing that on GPUs."
I found somebody at NVIDIA Research, a guy named Bryan Catanzaro, who now runs our Applied Deep Learning Research Group... At the time he was actually a programming language researcher, but he was interested in deep learning and had the right background knowledge, and his assignment was to work with Andrew and move Andrew's neural network for finding cats to run on GPUs.
We were able to take what took 16,000 CPUs and run it on (I think it was) 48 GPUs, at an even higher performance than he was getting. The software that came out of that turned into cuDNN, on top of which we basically ported just about every framework there is.
Now, the other thing that happened - the GPUs we had at that time, which were [unintelligible 00:09:14.07] transition, weren't originally designed to do deep learning; they were designed to do graphics and high-performance computing... So they had good 32-bit floating point performance, good 64-bit floating point performance, but it turns out what you want for deep learning training is FP16, and what you want for deep learning inference is INT8, and they weren't actually particularly good at either of those. So as we learned more about what deep learning needed, our subsequent generations of GPUs have been specialized for deep learning. We've added support for FP16 for training, we've added support for INT8, INT4 and INT1 for inference, and we built Tensor Cores, which are special purpose units that basically give us the efficiency of hardwired deep learning processors like Google TPU, but without giving up the programmability of a GPU.
While the original GPUs were good at deep learning, now that we've gotten more experienced with deep learning, learned what it really needs and have specialized and optimized the GPUs today - especially Volta and Turing are really great at deep learning.
Daniel Whitenack: That's awesome. I was trying to soak all that up; there's so much context and great information that I wasn't aware of before... For example, the evolution of CUDA and how it came from this Brook language that you mentioned, and how the classifying of cats fit in, and all of that. Were you aware of a lot of that, Chris? A lot of that is great new context that I wasn't aware of.
Chris Benson: Yeah, he took topics that I now would consider a shallow understanding, up until this point, and went deep, which is fantastic, so... Be careful, Bill, because we have a whole bunch more questions for you; we're gonna dive deep into some of these things you're telling us about.
Bill Dally: Okay.
Daniel Whitenack: You mentioned a lot of things that I would love just a little bit of clarification on, for those in our audience that it's maybe new to them... You mentioned the evolution of CUDA, and you also mentioned how GPUs were integral to the scaling of the deep learning training and all of that. I was wondering if we could take a step back, and from your perspective get your explanation of what a GPU is generally, why it's useful for deep learning in particular, and how CUDA fits into that? ...what that interface looks like today.
Bill Dally: A GPU generally is just a very efficient parallel computer. Volta has 5120 CUDA cores, which really means 5120 separate arithmetic units could be operating in parallel. Coupled to that is a very efficient system for supplying data to those units and accessing memory. So for any problem that's very parallel, they are orders of magnitude more efficient than CPUs.
[00:12:09.10] CPUs, in contrast, are optimized for single-thread performance and for very little latency. But to do that, they wind up spending enormous amounts of energy reorganizing your program on the fly to schedule instructions around long latency cache misses. If you try to access memory and you're lucky, you get a number in three clock cycles; if you're not so lucky, it might be 200 clock cycles. So they've got to do a lot of bookkeeping to work around that uncertainty. The result of that is a huge amount of energy that's spent, and therefore performance and energy efficiency is orders of magnitude less than a GPU.
A GPU takes advantage of the fact that if you have a very parallel program, you can hide any memory latency with more parallels; you can work on something else while you wait for the data to come back. So they wind up being extremely efficient platforms for tasks like deep learning, where you have many parallel operations that can be done simultaneously before you get the results of one of them back.
Daniel Whitenack: And that's for the matrix type operations that you're talking about, and also the iterative training processes? Is that right?
Bill Dally: Right. At the core of deep learning are convolutions and matrix multiplies. In fact, you can turn the convolutions into matrix multiplies through a process called [unintelligible 00:13:23.09] Fundamentally, if you can do a very efficient matrix multiply, you can do really well at deep learning, and GPUs are very good at doing those matrix multiplies, both because they have an enormous number of arithmetic units, because they have a very highly optimized memory and on-chip communication system for keeping those arithmetic units busy and occupied.
Chris Benson: That is really a great explanation, and that's helping me a lot. I would like to understand, beyond just NVIDIA's GPUs, those of us that are out here consuming information in this space are always hearing tons of other acronyms - CPUs, TPUs, ASICs... If you could explain to us a little bit what is different about a GPU from those other architectures that are out there, and what are some of the advantages and disadvantages. Why is it that NVIDIA is able to lead the way with this GPU technology that you've been bringing us for these last few years?
Bill Dally: Sure. So I already mentioned some of that by comparing CPUs and GPUs. A CPU (central processing unit) like an Intel Xeon, or AMDs latest parts, is optimized for very fast execution of a single computational thread. As a result of that, it spends an enormous amount of energy rescheduling instructions around cache misses, and as a result, it winds up burning something on the order of a nanojoule per instruction, where the actual work of that instruction maybe only takes 1% of that energy. You can think of them as being 1% efficient.
GPUs actually spend more than half of their energy doing the payload arithmetic on computationally-intensive problems, so they are many times more efficient than CPUs at that. Now, CPUs have vector extensions that try to get some of the efficiency of GPUs, but if you look at the core of a CPU, they're extremely inefficient, but very good at doing a single-thread. If you don't have any parallelism and you need the answer quickly, a CPU is what you want. If you've got plenty of parallelism and you can hide your memory latency by working on something else while you're waiting for that result to come back from memory, then a GPU is what you want.
You mentioned also TPUs and ASICs. Well, the TPU is a type of ASIC. It's an Application-Specific Integrated Circuit. In this case, the application it's specific for is doing matrix multiplies. A Google TPU, especially the TPU 1, which they've just had an article in CACM about, is a big unit that basically a systolic array to multiply two matrices together... And it's extremely efficient at that. So if all you need to do is multiply matrices, it's very hard to beat a TPU.
[00:16:03.04] The approach we've taken with our latest GPUs is to put Tensor Cores on them. What Tensor Cores are - they're little matrix multiply units. They're very specialized to multiply matrices together. The difference is by specializing by adding a unit to a general purpose processor, we get the efficiency of that specialization without giving up the programmability of the GPU. If you need to write a custom layer to do a mask because you're doing pruning and have a sparse set of weights, or if you need a custom layer to do a new type of nonlinear function that you're experimenting with, or you wanna do some type of concatenation between layers that is a little bit different, it's really easy to write that in CUDA, program it on the GPU and it will execute it extremely well, with all the efficiency of the hardwired matrix multiply units coming from the Tensor Cores... Whereas on the TPU you have that efficiency, but you don't have the flexibility. You can only do what that one-unit has been designed and hardwired to do.
Now, the advantage of that is it's about the same energy efficiency, so when you're not using the other features of the GPU, you're not paying for them, they don't burn any energy... But they are sitting there, using up [unintelligible 00:17:10.24] So the TPU costs a little bit less to manufacture, because you don't have all of that general purpose processor sitting around it. But what you give up for that is the flexibility of being able to support new deep learning algorithms as they come out... Because if those algorithms don't match with the TPU it's hardwired for, it can't do it.
Daniel Whitenack: Yeah, and as we've seen, the industry isn't moving very fast at new neural network architectures, right? [laughs]
Bill Dally: They're coming up every day... It's hard to keep up with all the papers on archive.
Daniel Whitenack: It is, definitely. We try a little bit on this show, but we're constantly falling behind.
Chris Benson: Quick follow-up - in that case, based on the fact that you have the Tensor Cores in the GPUs, it's unlikely that NVIDIA then would likely go to some sort of ASIC architecture, or something else like that, since you essentially have already accounted for that value in your GPU architectures? Is that a fair statement?
Bill Dally: Actually, not. We actually have our own ASIC-like architecture as well, in that we have something called the NVIDIA Deep Learning Accelerator (NVDLA), which we've actually open-sourced. If you go to nvdla.org you'll see our web page where you can download the RTL, and the programming tools and everything else, for what is actually a very efficient hardwired neural network accelerator. We use the NVDLA ourselves in our Xavier chip, which is the system on a chip that we have for our self-driving cars. The Xavier has a number of ARM cores of our own design, it has basically a tenth of a Volta GPU; it's 512 CUDA cores, rather than 5120... And then it has the NVDLA, as well as a computer vision accelerator, because in embedded processors on the edge, that area of efficiency is important. We don't wanna give up the dye area for doing deep learning entirely on the GPU.
Now, there's still an awful lot of GPU performance on Xavier; it's over 10 Tera OPS on the CUDA cores, but there's also another 20 Tera OPS on the deep learning accelerators. So you wind up being able to support very efficiently large numbers of inference tasks on that.
We're actually doing it both ways. For the embedded applications we have a hardware deep learning accelerator... For both inference and training in the [unintelligible 00:19:31.25] after considering all options, we have decided it's just much better to put the efficient Tensor Cores onto a programmable engine, rather than building a hardware accelerator.
Daniel Whitenack: So you mention -- and this is a great lead-in... You mention a variety of fronts on which NVIDIA is working, and you also mentioned a desire that you guys have to keep things programmable and easy to interface with and customize. One of the things that I've definitely seen is that NVIDIA is definitely making contributions not only on the hardware side, but on the front of helping users be able to interface with all sorts of these new types of hardware.
[00:20:19.04] For example, I see NVIDIA Docker, and I see things related to Kubernetes, and NVIDIA working to help people both program their hardware, but also access and manage and orchestrate things. I was wondering if there's anything you wanna highlight on that side, and mention where the different areas that you see NVIDIA working on that are really exciting; maybe not on the hardware side, but maybe on the orchestration or software side.
Bill Dally: We actually do research on deep learning that spans the gamut from fundamental deep learning algorithms and models, training methods, to tools that make it easier for people to use deep learning, all the way up to the hardware. The stuff that I'm most excited about is some of the work on fundamental models and algorithms.
For example, right now we have the world's best neural network for doing optical flow, which is a really nice hybrid of classical computer vision and deep learning, because we've applied a lot of what's been learned over 30 years of doing optical flow the old way, but then built that around a deep learning approach and we get the best of both worlds.
We also have done an enormous amount of research on generative adversarial networks. We developed a method -- we're the first people to train high-resolution generative networks. In the past you just had too many free variables; if you tried to train a GAN to build a high-resolution image, it would just get confused and never converge.
We applied [unintelligible 00:21:48.26] learning, where we train the GAN first to do low-resolution images; once it's mastered that, we then increase the resolution progressively. We call it progressive GAN. We're very successfully able to generate high-resolution images. This has been applied to numerous tasks.
We've also been able to build coupled GANs, where we can use them to transfer style. For example, if we have a bunch of images in daylight, good weather, we can change those to images at night, or images in the rain, or images in the snow, and this lets us augment datasets for self-driving cars.
We can also use these GANs to generate medical datasets, being able to take for example brain images, and tumor images, and combine them in various ways to build larger training sets than you can get by just using the raw data, and then a combination of the real data and these synthetic images winds up giving you better accuracy than one alone. So that works for exciting.
We also have a number of tools - you mentioned our Docker platforms... We also have a tool called TensorRT, which optimizes neural networks for inference, so we get much more efficient execution on our GPUs than if you simply naively mapped the networks on there.
So across the board, we've been trying to build the whole ecosystem, so that somebody who has a problem can draw from our collection of algorithms, they can draw from our tools, and then ultimately run it on our hardware and get a complete solution for their problem.
Daniel Whitenack: How do you keep all of those wheels turning, as the VP of research? There's a lot of different areas, spanning all the way from hardware, to software, to tooling, to AI research... I'm sure it's exciting, but a lot going on.
Bill Dally: Fortunately, I don't have to keep them all turning myself. I'm responsible for NVIDIA Research, which is an organization of about 200 people, and we do research on topics ranging from circuit design to AI algorithms. Basically, what we do is we hire really smart people, and then we try to enable them, to take all the obstacles out of their way, get them excited about the important problems.
[00:23:56.20] The objective of NVIDIA Research is to do two things - one is to do research; there are a lot of corporate research labs that actually don't do research... They wind up really doing development, because they get pulled in too close to various product groups. The product groups always wind up having some fire to put out, so they'll pull the researchers onto the short-term development work to put the latest fire out. They wind up not really doing fundamental research.
So our goal is to that fundamental research, and we succeed in that, as evidenced by publishing lots of papers at leading conferences, like NIPS, and ICLR, and ICML, and CVPR.
The other goal is to make sure that that research is beneficial to NVIDIA, that it makes a difference for the company. And again, that's another failure [unintelligible 00:24:38.02] industrial research labs. Many of them publish lots of great papers, do lots of great research, and it has absolutely no impact on their parent company. I think I'd have trouble convincing Jensen to continue running the research lab if we didn't have many successes, but we do.
For example, the ray tracing cores in Turing were originally an NVIDIA Research project; cuDNN, as I mentioned, came out of research. We are applying deep learning to graphics; we demonstrated with Turing something called Deep Learning Super-Sampling - basically it anti-aliases samples in image using neural networks, and does it in a temporally stable way.
Our DGX-2, which includes NVSwitch. NVSwitch started as a project in NVIDIA Research, as did NVLink, on which the Switch is based... So we have a long track record taking crazy ideas, maturing them within NVIDIA Research, and then getting the product groups to embrace them and ultimately put them into future GPUs and software and systems products that we produce.
Chris Benson: Bill, as we come back out of break, I wanted to ask you to back out just a little bit, because we've gone down some amazing paths; I know Daniel and I have learned so much already on the show from you... But I wanted to put a little context around some of that, and kind of get a sense, as you've told us about all of these amazing technologies, what is NVIDIA's vision for the future of AI, and as you've talked about some of the parts of your AI platform, how are you utilizing that platform strategically to realize that, and what kind of investments are you expecting NVIDIA to make going forward?
Bill Dally: That's a really good question. The short answer with the future of AI is continued rapid innovation. I expect to continue to have to stay up late every night, reading papers on archive, and even then not be able to keep up with what's going on. But if you look at how that rapid innovation is happening, I think it's along several different axes.
The first axis I think is breadth of applications. I think we've only began to scratch the surface of how AI is affecting our daily lives, how we do business, how we entertain ourselves, how we practice our professions... I expect more applications of AI to be occurring every day, and those applications to present unique demands - the type of models we need, how we curate training data, how we train the networks with that data and so on.
The next axis I would say is one of scale. Scale of both model size and datasets. We've seen this in areas like computer vision, in speech recognition, in machine translation, where over time people collect larger data sets, to have the capacity to learn those datasets they build larger models. That really raises the bar for the performance you need to train those models on those large datasets in a reasonable amount of time.
And then finally, the axis that's probably most exciting to me is coming up with new models and new methods that basically increase the capability of deep learning, to be more than just perception, to basically give it more cognitive ability, to have it be able to reason about things, to have longer-term memories, to operate and interact with environments... A lot of the work in reinforcement learning we find very exciting along that axis.
[00:27:52.20] Seeing this constant innovation on all three of these axes - our goal with our platform is to evolve to meet these needs; to meet the needs of newer applications, to meet the needs of larger scale and more capable models and methods... And there's a couple ways we need to do that. One is to continue to raise the bar on performance. To train larger models and larger datasets requires more performance, and Moore's Law is dead; we're not getting any more performance out of process technology, so it requires us to innovate with our architecture, with our circuit designs to do that... And we've done that generation to generation. If you look at the performance from [unintelligible 00:28:27.17] where we started working on deep learning, to Maxwell, and Pascal, Volta and now Turing - we've been able to really increase by large multiples deep learning performance on each subsequent generation, in the absence of really any help from process technology. We expect to continue doing that.
The next thing we need to do is we need to make it easier to program, so that people who are not experts in AI, but are rather experts in their domain, can easily cultivate a dataset, acquire the right models and train them. We do that through our tools; we support every framework. We have TensorRT to make it easy to map your applications onto inference platforms...
And then we also have training programs. We have a Deep Learning Institute, where we basically take people who are application experts and train them so that they can apply deep learning to their application.
Then the final way we want our platforms to evolve is to remain flexible. The deep learning world is changing every day, and so we don't wanna hardwire too much in and not be able to support the latest idea. In fact, we think it would inhibit people coming up with the latest idea if the platform that everybody is using was too rigid. We wanna make it a very flexible platform, so that people can continue to experiment and develop new methods.
Daniel Whitenack: In light of that, I'd be really interested to hear from your perspective how ideas at NVIDIA actually advance from research to reality, and particularly in light of what you've just said - you want to make things easier for people to program, easier for application people to interface with, while at the same time pushing performance forward and keeping flexible. It definitely seems like it might be hard to balance those things, but as you've already mentioned, there's been a lot of great things that you guys have come out with that do balance that really well... So I'm wondering, from that perspective, how you see things advancing from research to reality at NVIDIA.
Bill Dally: Yeah, that's a good question, and one that I'm very excited about, because it's my job to make sure those things advance. So not all ideas start in NVIDIA Research; many ideas start in the product groups, many ideas start with the application engineers who work with the customers and see the need, but for the ideas that do start in NVIDIA Research, which is an organization of about 200 people, individual researchers generally just start experimenting with things, they come up with a good idea, and then the goal is to find a way for that idea to have impact on the company... So we try to make sure everybody, when they come up with an idea, identifies both a champion and a consumer (who are often the same person) in the product groups for that technology.
As they develop the technology further, they get some indication about "Gee, does the champion care about this technology? Can they make their product better?" And if they don't, it's often an indication they should drop the idea.
To me, one of the keys of good research is to kill things quickly. Most research projects actually don't go anywhere, and there's nothing wrong with coming up with research ideas that don't work. What's wrong is spending a lot of resources on them before you give up on the ones that don't work. So we try to kill the ideas that either aren't gonna work or aren't gonna have impact on the company pretty quickly. But the ones that are gonna have impact on the company, one thing that's really great about NVIDIA is it's like a big startup - there's no politics, there's [unintelligible 00:31:45.04], so if there's a good idea, the product groups don't care that it came out of research; they say, "That's a great idea. We want that", and very often they'll grab things out of our hands before we even think we're done with them, and the Switch was a great example of that. We wanted to actually complete a prototype in research, and we didn't get the chance; they grabbed it and made it a product before we had the chance to do that.
[00:32:03.15] And it's really about people. The people that come up with the concept are communicating with the people who will turn it into reality, and then once it jumps over to that side, it becomes more of an engineering endeavor and less of a research endeavor, where people have to hit goals, things have to work, they have to be verified... But the whole process works, and ultimately we're able to very quickly go from concept to delivering very polished, very reliable products to our end customers.
Chris Benson: I would like to take you in a particular use case. I know when I was at GTC in March, Jensen was on stage, doing his keynote, and we had all walked in looking at the amazing autonomous vehicles that you guys had in the lobby, and he made a comment that really struck me, and I was just wanting to get your thoughts on it. He said, "Everything that moves will be autonomous", and in that presentation he went way beyond just cars; he was talking about literally everything, whether it be on the land, sea or air... So obviously, that would include GPUs and maybe other specialized processors that you guys put into those vehicles... But what are the things that you're doing to realize that vision, considering how cool it is to the rest of us?
Bill Dally: That's a great question. One thing we're doing in NVIDIA Research is we're actively pursuing both autonomous vehicles and robotics. In fact, autonomous vehicles are a special case, and in many ways an easy case of robotics, in that all they really have to do is navigate around not hit anything. Robots actually have a much harder task, in that they have to manipulate, they have to pick things up, and insert bolts into nuts, they have to hit things, but hit things in a controlled way, so that they can actually manipulate the world in a way that they desire.
I've recently started a robotics research lab at NVIDIA, in Seattle. We hired Dieter Fox from the University of Washington to lead that lab... And robots are just a great example of how deep learning is changing the world, because historically, robots have been very accurate positioning machines, if you look at how they've actually been applied in the world. Auto manufacturers use them on their lines to do spot welding, and to spray paint... But they're not responding to the environment; they simply have been programmed to very accurately move an [unintelligible 00:34:10.09] position repeatedly, over and over again, doing exactly the same thing.
With deep learning, we're actually able to give robots perception and the ability to interact with the environment, so that they can respond to a part not being in the right place, adjust, manipulate, pick that part up, move it around... They can perhaps even work with people, working as a team, where the robot and the person are interacting together, by using deep learning to provide them with both sensory abilities and also through reinforcement learning, the ability to reason and choose actions for given states that they find themselves in.
So our goal from this is by doing this fundamental research in robotics is to basically learn how to build future platforms that will be the brains for all of the world's robots, just like we want to build the platform that's gonna be the brains for all of the world's autonomous vehicles. Hopefully this research will ultimately lead to that platform - not just the hardware, but the various layers of software and ultimately the fundamental methods that those future robots and autonomous vehicles will be using.
Daniel Whitenack: Bill, we've kind of transitioned into talking about use cases, and you've mentioned a lot about robots and other things at "the edge"... I was wondering if you could give us a little bit of a perspective, moving forward, at what you see as the edge and how neural networks, both training and inference, will be spread across centralized compute in the cloud, or on premise, and on edge devices, and what edge devices might look like.
Bill Dally: [00:35:46.08] That's a good question. I see deep learning as happening in three ways. The first is training, which by large takes place in the cloud. And the reason why you want it to take place in the cloud is that first of all you need to have a large dataset. You need to have some place where you can store terabytes of data, maybe even more than that... And you really wanna do that in a centralized location. Also, if you're gathering training data, say, from a fleet of autonomous vehicles, you want them all to learn from each other's experiences; you wanna gather all that data, [unintelligible 00:36:18.08] one place, curate the data to basically discard the stuff that's not very interesting, keep the stuff that is, and then train one network on all of the data.
Training really wants to happen in the cloud. The cloud has a large dataset, it has a large memory footprint, it has unique requirements - it requires FP16... And then there's inference, and inference happens in both the edge and the cloud. I think most people, if you can do inference in the cloud, would prefer to do it there; there's an economy of scale, and you can also share resources. If you have a task where you're not doing inference constantly, but on demand, then you don't need to have a resource tied up all the time; you can share it, use it when you need it, somebody else can use it when you don't need it. So it's just more efficient to do inference in the cloud...
But there are cases when you can't do inference in the cloud, and an autonomous vehicle is a great example. First of all, you may have latency requirements. If your camera sees the kid running into the street, you can't afford the latency to send that image to the cloud, do that inference there, and send the breaking command back. You need to have a very tight loop that commands a car to stop. You also may not be connected, or you may have bandwidth limits.
For example, people who have networks of surveillance cameras are producing just too much data to send all of it to the cloud. They need to do some data reduction, at least locally, have some local inference that filters the data and then sends only the interesting data to the cloud for further processing.
And then finally, maybe privacy constraints, that limit your ability to send stuff up to the cloud. You may wanna handle things locally to avoid sharing data that you don't wanna share. So I think there are a lot of reasons why you wanna do inference in these embedded devices, and almost no reason why I think you would wanna do training there.
In the case where you are doing inference in the embedded devices, that often has very strong energy efficiency constraints; they may be battery-operated, they may need to run for a long period of time without being recharged... So the efficiency demands are even higher than for inference in the cloud.
Chris Benson: Yeah, I've actually run into that myself in terms of the battery constraints, doing inferencing on mobile devices. We've covered so much ground... If you are a software developer, or maybe a data scientist who's doing software development and engineering, and you're looking at all of these things that we have been talking about from an app dev perspective, from training, and the hardware, working on the edge, the different tools, CUDA, you name it - what are the necessary skills that people should be thinking about? So many people are kind of self-training themselves into this, and there is so much for a person who's just trying to get into AI to learn. How would you structure that, if somebody is trying to self-train themselves into this field?
Bill Dally: I think actually what you need to know to be successful in AI falls into two categories. One is basic knowledge, and the other is very practical how-to information. For the basic knowledge, I think what's most important is having a really strong background in mathematics, and particularly in statistics and probability theory, because that's what all of AI is based on; you're basically doing statistical estimation of a number of things.
Then the practical side of it is knowing how to use the tools that are available, whatever your favorite framework is, whether it's PyTorch or whether it's TensorFlow, having the practical knowledge to get a model, get a dataset and run the tools to train it.
Chris Benson: Since you've mentioned that, I'm just curious... Daniel and I have used different tools; do you have any personal favorites that you like to use? Not suggesting anything that you say is the right thing that everybody should do, but we always like to find out what people's preferences are.
Bill Dally: I don't really have any strong preferences. I have to confess that I actually don't do that much coding myself anymore, and the people I work with often migrate to one or another for different reasons. A lot of people use PyTorch because they like to work from the Python base; many people use TensorFlow - I think it is probably the most popular framework overall these days.
Daniel Whitenack: [00:40:12.04] Yeah, I'm sure a lot of the frameworks that your team uses, and also the tools that they generate and the research that they generate - I'm sure a lot of that uses open source tools like you've already mentioned. Are there any things you'd like to highlight, that NVIDIA is doing on the open source front, that maybe our listeners could go and check out and potentially start playing around with?
Bill Dally: One thing I'll highlight actually is our deep learning accelerator. If your listeners go to nvdla.org, if they actually wanna play with hardware for deep learning, they can download the RTL for that accelerator, customize it to their needs, include it into either an FPGA or an ASIC of their own design...
We also open-source a lot of software that comes out of our research. For example, our work on progressive generative adversarial networks (progressive GANs), our work on networks that we use for optical flow, our work on de-noising - all of those networks have been open-sourced, so people can very easily replicate our results and apply those new methods that we've built to their own problems.
Daniel Whitenack: Awesome. That's super-helpful, and we'll make sure and include some links in our show notes to that. As we wrap up here and get to the end of our conversation, once again, I really appreciate all of the perspective on these different things. It was really helpful for, I know...
I was wondering if you have any parting thoughts or inspiring thoughts for the listeners, assuming that our listeners are either already in or getting into the AI field, and kind of trying to find their place and find what people are working on... Do you have any parting thoughts for them, or encouragements?
Bill Dally: I think it's just a very exciting time to be working in AI, because there are so many new developments happening every day. It's never a dull place. In fact, there's so much stuff happening that it's hard to keep up. As a hardware engineer, I think it's also very rewarding to know that this whole revolution in deep learning has been enabled by hardware. All of the algorithms - convolutional nets, multilayer perceptrons, training them using stochastic gradient descent and backpropagation... All of that has been around since the 1980's, since I first started playing with neural networks, but it wasn't until we had GPUs that it was really practical. GPUs basically were the spark that ignited the revolution.
The three ingredients were the algorithms, the large datasets - those were both there, but then you needed the GPUs to make it work. For computer vision it wasn't until AlexNet in 2012, where using GPUs he was able to train the network to win the ImageNet competition, that deep learning really took off.
So I think GPU's are what ignited this, and I think GPU's are still really the platform of choice, because with the Tensor Cores they provide the efficiency of special purpose units, but without the inflexibility of a hardware [unintelligible 00:42:56.28] like a TPU, so you get the best of both worlds. You can program in CUDA, but get the efficiency of a Tensor Core.
Chris Benson: Thank you very much, Bill. For me, I have learned so much on this episode that I'm probably gonna have to go back and listen to it a couple of times to take in everything that you've taught us today. It's been really packed with incredible information, so thank you very much for coming on.
Bill Dally: It's my pleasure, thank you.
Chris Benson: With that, we'll look forward to our next episode. I hope our listeners got as much out of it as Daniel and I did. Daniel, are you doing good? Is your head going to explode yet?
Daniel Whitenack: I've got a bunch of website pulled up that I'm gonna start reading afterwards, so... It was a great time, and we'll talk to you again next week.
Chris Benson: Great. Thank you very much, Bill.