diff --git a/transcripts/301-deploying-django.txt b/transcripts/301-deploying-django.txt index 93e726c9..cda4e85a 100644 --- a/transcripts/301-deploying-django.txt +++ b/transcripts/301-deploying-django.txt @@ -1,6 +1,6 @@ -00:00:00 We've been learning Django and now you want to get your site online, you're not sure about the best way to host it or the trade offs between the various options. Maybe you want to make sure your Django site is secure. On this episode, I'm joined by two Django experts. Will Vincent and Carlton Gibson talk about deploying and running Django in production along with recent updates in Django three dot two and beyond. This is taught by thunder Mae, Episode 301, recorded January 19 2021. +00:00:00 We've been learning Django and now you want to get your site online, you're not sure about the best way to host it or the trade offs between the various options. Maybe you want to make sure your Django site is secure. On this episode, I'm joined by two Django experts. Will Vincent and Carlton Gibson talk about deploying and running Django in production along with recent updates in Django3.2 and beyond. This is talk python to me, Episode 301, recorded January 19 2021. -00:00:40 Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter, where I'm at m Kennedy, and keep up with the show and listen to past episodes at talk python.fm and follow the show on Twitter via at talk Python. This episode is sponsored by square and linode. Please check out what they're offering during their segments. It really helps support the show. Before we get to our discussion, just one quick announcement. We started live streaming the recordings of talk Python to me episodes on YouTube. If you're part of the live stream, you'll have a chance to ask questions and your comments might get featured on the air. Just visit talk python.fm slash YouTube to subscribe to the channel and see upcoming and past live streams. Now on to the show. Bolton will welcome to talk Python to me again. Welcome back, guys. +00:00:40 Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy, and keep up with the show and listen to past episodes at talk python.fm and follow the show on Twitter via @talkpython. This episode is sponsored by Square and Linode. Please check out what they're offering during their segments. It really helps support the show. Before we get to our discussion, just one quick announcement. We started live streaming the recordings of talk Python to me episodes on YouTube. If you're part of the live stream, you'll have a chance to ask questions and your comments might get featured on the air. Just visit 'talkpython.fm/youtube' to subscribe to the channel and see upcoming and past live streams. Now on to the show. Bolton, Will welcome to talk Python to me again. Welcome back, guys. 00:01:28 Thank you, you came on our show and I was on your show three years ago. So Carl, first time on your show toolpaths a newbie. @@ -14,15 +14,15 @@ 00:02:39 but is that the very first three to release? -00:02:41 That's the first one. So that's like, if you've got a Django project, and you've got ci, you download it now and you run it on your ci and you tell us all the things we broke better before we released the final, you know, it's much better during the the alpha pre release period, then, you know, just after 3.2 final you go, Hey, everything's broken. +00:02:41 That's the first one. So that's like, if you've got a Django project, and you've got CI, you download it now and you run it on your CI and you tell us all the things we broke better before we released the final, you know, it's much better during the the alpha pre release period, then, you know, just after 3.2 final you go, Hey, everything's broken. 00:02:57 Yeah, you really need people to test it. Now, before it's too late. Right? -00:03:01 Yeah. But yes, it's quite exciting, because the last couple of weeks have been building towards the feature free, so the alpha marks the feature free, so the beat over the pre release period, now we'll we'll merge bug fixes in the new features. So anything that people find them and report will fix and get out before the final, but there's no new features going into 3.2. So on Thursday, just gone, I branched the stable branch for 3.2, which will be that's Django 3.2. And it will be that for the next three years, because the long term release gets quite the long term support release, which 3.2 is gets three years of support. That's really great. I will definitely have to come back to that. But also, welcome. We'll tell everyone about yourself real real briefly. You +00:03:01 Yeah. But yes, it's quite exciting, because the last couple of weeks have been building towards the feature free, so the alpha marks the feature free, so the beat over the pre release period, now we'll we'll merge bug fixes in the new features. So anything that people find them and report will fix and get out before the final, but there's no new features going into 3.2. So on Thursday, just gone, I branched the stable branch for 3.2, which will be that's Django 3.2. And it will be that for the next three years, because the long term release gets quite the long term support release, which 3.2 is gets three years of support. That's really great. I will definitely have to come back to that. But also, welcome, Will tell everyone about yourself real real briefly. You 00:03:40 know, you were on the show a while ago, and we talked, I believe it was learning Django we spoke about right -00:03:44 and left in Django at the time. So I'd written I think a couple books. So I have three books, Django for professionals and API's. And the last two years I've been a member of the Django Software Foundation Board about trees who were voted in who were play a prominent role in the community one way or another. And then the board is seven people annually voted on who managed Django itself, which is a nonprofit, what that later So yeah, I and I have now a website, learn django.com, which is an online version of educational content. But in addition to having a podcast with Carlton Jango chat, since he's a fellow which is contracted by the board, we have another touch point, as if we didn't need +00:03:44 and left in Django at the time. So I'd written I think a couple books. So I have three books, Django for professionals and API's. And the last two years I've been a member of the Django Software Foundation Board about trees who were voted in who were play a prominent role in the community one way or another. And then the board is seven people annually voted on who managed Django itself, which is a nonprofit, what that later So yeah, I and I have now a website, 'learndjango.com', which is an online version of educational content. But in addition to having a podcast with Carlton, Django chat, since he's a fellow which is contracted by the board, we have another touch point, as if we didn't need 00:04:22 more. Exactly, yeah. So I was on your show a while ago, and that was super fun. I really enjoyed our conversation there. And you know, maybe just tell people quickly about your podcast, what kind of stuff you cover there. Obviously Django but you know, where do they find it? Where they cover? Okay, so @@ -32,11 +32,11 @@ 00:05:01 yay, yay, yay, chat django.com -00:05:04 I think recreating a little bit of what you would have at a Django con event. These are annual events. So we started after I went to my first Django con at Carleton and was like, why can't I talk to people about Django more often? And at the time, there wasn't a Django podcast. So yeah, it's we've gotten to interview really my dream list of guests. I mean, we we had actually, I was just talking to someone else who works at stripe, which is still running on Sinatra, which is a Ruby thing. And I was like, Yeah, I've talked to DHH because he's came on our shows, we talked about rails versus Django. We had Carl Meyer on his Instagram was formerly Django core, all the basically everyone, almost everyone who'd want to talk to who's involved in Django is willing to come on and share their story. So it's, I think it's a really nice connector and educational for us to do as +00:05:04 I think recreating a little bit of what you would have at a Django con event. These are annual events. So we started after I went to my first Django con at Carleton and was like, why can't I talk to people about Django more often? And at the time, there wasn't a Django podcast. So yeah, it's we've gotten to interview really my dream list of guests. I mean, we we had actually, I was just talking to someone else who works at stripe, which is still running on Sinatra, which is a Ruby thing. And I was like, Yeah, I've talked to DHH because he's came on our shows, we talked about rails versus Django. We had Carl Meyer on his Instagram was formerly Django core, all the basically everyone, almost everyone who'd want to talk to who's involved in Django is willing to come on and share their story. So it's, I think it's a really nice connector and educational for us I do as -00:05:45 well. And you're just sort of a sidebar for our show here. Yeah, being all of us, fellow podcasters. I think it's really interesting, the role that podcasts play in keeping connections to the broader tech community, when we can't go anywhere. I mean, even if you weren't typically going to conferences or meetups, you could still go to work and see other people, right. It's just the first one of the COVID hit. I was like, I've just I'm not traveling anywhere. I'm not taking my kid to school or anything like that. So there's no like natural place here. I'm stuck for 45 minutes, I'm going to just listen. But as it's drawn on, I started listening to shows, especially with multiple people trying to bring sort of a normalcy. And I get to kind of hang out with these people, even though that they don't respond to me, I still get to hang out with them. And I think that's a really interesting societal thing that's happening right now. +00:05:45 well. And you're just sort of a sidebar for our show here. Yeah, being all of us, fellow podcasters. I think it's really interesting, the role that podcasts play in keeping connections to the broader tech community, when we can't go anywhere. I mean, even if you weren't typically going to conferences or meetups, you could still go to work and see other people, right. It's just the first one when the COVID hit. I was like, I've just I'm not traveling anywhere. I'm not taking my kid to school or anything like that. So there's no like natural place here. I'm stuck for 45 minutes, I'm going to just listen. But as it's drawn on, I started listening to shows, especially with multiple people trying to bring sort of a normalcy. And I get to kind of hang out with these people, even though that they don't respond to me, I still get to hang out with them. And I think that's a really interesting societal thing that's happening right now. -00:06:33 Yeah, I think it's beneficial across the board. I mean, I probably like like you, I listen to podcasts outside of tech as well. And it's sort of like people, I'd want to sit in on their conversations anyways. So for me, it's probably my primary media consumption, aside from books. Yeah, I mean, I'm not reading ancient Greek, like Carlton Carlton has a PhD and all that. We're gonna talk about specifically deployments on Django, and we can go on and on about, obviously, all the intricacies of Django, but I think suffice to say I'm an educator on the on the board, and Carlton is a fellow. So he's, he makes the releases happen, including three, two alpha, which is dropped today. And I guess you mentioned LTS, so that's confusing to non Django people. So since having ellos, like Carlton Django has a pretty rapid release cycle, where it's every nine months or so. So there's 303132. This December, I think, Carlton is four, zero, every one of those is a long term service release. So that'll last two and a half years. So that is a three years. Yeah, it's it's on, there's a link on the Django project site. So that's a way that so Django doesn't really have. It's rare to have breaking changes these days. But the LTS is designed to help people who can't keep up with that cycle. Stay up to date, though, we have a lot of podcasts and opinions about why you should always stay up to date. And it's worth it. Yeah, because that's one of the most fair things you can do. Because as Carlton mentioned, there's bug fixes constantly. So there'll be, there's 311312, there'll be, you know, three to the three to one a month later. So +00:06:33 Yeah, I think it's beneficial across the board. I mean, I probably like like you, I listen to podcasts outside of tech as well. And it's sort of like people, I'd want to sit in on their conversations anyways. So for me, it's probably my primary media consumption, aside from books. Yeah, I mean, I'm not reading ancient Greek, like Carlton Carlton has a PhD and all that. We're gonna talk about specifically deployments on Django, and we can go on and on about, obviously, all the intricacies of Django, but I think suffice to say I'm an educator on the on the board, and Carlton is a fellow. So he's, he makes the releases happen, including 3.2 alpha, which is dropped today. And I guess you mentioned LTS, so that's confusing to non Django people. So since having ellos, like Carlton Django has a pretty rapid release cycle, where it's every nine months or so. So there's 3.0,3.1,3.2. This December, I think, Carlton is 4.0, every one of those is a long term service release. So that'll last two and a half years. So that is a three years. Yeah, it's it's on, there's a link on the Django project site. So that's a way that so Django doesn't really have. It's rare to have breaking changes these days. But the LTS is designed to help people who can't keep up with that cycle. Stay up to date, though, we have a lot of podcasts and opinions about why you should always stay up to date. And it's worth it. Yeah, because that's one of the most fair things you can do. Because as Carlton mentioned, there's bug fixes constantly. So there'll be, there's 3.1.1,3.1.2, there'll be, you know, 3.2 the 3.2.1 a month later. So 00:07:56 I'm a big advocate of you know, if you possibly can get on the latest major release. So you like hanging out, historically, the long term release was really the LTS release was really important. Because there were breaking changes, right, in each major version of Django, there were new things, and it was difficult to keep. But that's not the case anymore. It's really easy to update. So I'm a big advocate of that now. And then when I you know, I talk to fellow people in the Django community, they're like, Well, you know, I work in a real world, and you can't keep up on the latest major version. So for those folks, then, you know, the LTS is a really good option, because it's once every three years, you know, it's coming. Yeah, you get a six month window of overlap of support. So the old LTS gets six months of security release after the release of the new LTS, and that's your window to update? @@ -50,17 +50,17 @@ 00:10:04 right? But it allows people who have that please don't touch it. But oh, there's a security problem. If there's no LTS, and there's a security problem, then not only do they have to figure out how to redeploy a fix, they've got to say, well, we didn't touch it for three years, and it doesn't quite work the same. So then you get into the discussion of Well, what's the risk? Will they really, it's just Java swing? I mean, come on, what's the problem? Right? how bad could that go? -00:10:28 You know, like the entire world's credit reports be now that we're in Python three world, Django was part of that. It's really beyond the security thing. It's also all the ecosystem around Django, the third party packages, like a lot of times, if you look at their project that says, I can't update, why can't you update, they did two things. They're using a third page, which falls into that, you know, touch it, you broke it situation, or they did something custom. At one point, they they went off the guardrails, and the bill comes due. I mean, it's so tempting, actually, and I actually want to speak and clear up the guardrails, quick note that, um, Django just passed flask stars, which is a really poor metric of popularity. But nonetheless, we'll take it, congratulations, that's awesome, very easy to spin up a couple API endpoints. And boom, you're using flask, that's a very different thing. That's like as DHH would say that's a Lego versus the Lego truck. That is a framework like Django. And anecdotally, a lot of places use flask. But in terms of a big site, that's all flask, that's much less common than all Django. +00:10:28 You know, like the entire world's credit reports be now that we're in Python 3 world, Django was part of that. It's really beyond the security thing. It's also all the ecosystem around Django, the third party packages, like a lot of times, if you look at their project that says, I can't update, why can't you update, they did two things. They're using a third page, which falls into that, you know, touch it, you broke it situation, or they did something custom. At one point, they they went off the guardrails, and the bill comes due. I mean, it's so tempting, actually, and I actually want to speak and clear up the guardrails, quick note that, um, Django just passed flask stars, which is a really poor metric of popularity. But nonetheless, we'll take it, congratulations, that's awesome, very easy to spin up a couple API endpoints. And boom, you're using flask, that's a very different thing. That's like as DHH would say that's a Lego versus the Lego truck. That is a framework like Django. And anecdotally, a lot of places use flask. But in terms of a big site, that's all flask, that's much less common than all Django. 00:11:31 Yeah. Well, I was thinking about this this morning. And it you know, the flask and Django are pretty neck and neck in terms of popularity. They're not on GitHub. Yeah. Yes, sir. There's these metrics and popularity, they there's like all over the map. So one of the thoughts I had though, is the people that use flask, this is not a knock against flask, I like it. But one of the my impressions is, if I just need to, like you said, just create a couple API's and just, we're just going to get something going real small and simple. And just roll like those people are not in as invested in the ecosystem. And the framework, as I feel like the Django folks are, I feel like the Django folks, it's a more encompassing part of their development experiences or development lifecycle like Django feels more part of the project when people adopt it and love it. 00:12:17 I don't know, what do you guys think about that? You need a dozen apps to use flask. And I hope that we get David Lorde on to talk about flask because he's a Django con. So I mean, it's not a compact, I don't mean to say there's a competition between the two but serve different purposes. I would say Carlton, right. -00:12:30 Yeah, the different styles as well, like, you know, if you want to put together something, why not floss, but I mean, I've been using Django for so long now that even if I just need to spin up two endpoints, it's much quicker for me to do that in Django than it is to go and get supposedly microframework and work out how am I supposed to use this, you know, +00:12:30 Yeah, the different styles as well, like, you know, if you want to put together something, why not flask, but I mean, I've been using Django for so long now that even if I just need to spin up two endpoints, it's much quicker for me to do that in Django than it is to go and get supposedly microframework and work out how am I supposed to use this, you know, 00:12:48 ever, we can put the link in the notes I have, I made up a repo with the code, because you didn't provide code, you can have a single file, Django project, same way you can fly, because a lot of times that Hello, world comparison will lead people to assume that flask is much less complicated than Django, and it's a little bit less, it's more around to how it's structured, which is the point of Katherine's talk. Yeah, so -00:13:08 the talk was called dumb how to you using Jenkins and Michael framework or something like that. And it was about the the base HTTP handlers, that kind of that really real core of the framework. And I was right. You know, I put up a few examples from different frameworks, like flask and node example and Starla. You know, async, microframework, and Tom Christie, and then I, you know, showed how you might put that together in Django. +00:13:08 the talk was called dumb how to you using Jenkins and Michael framework or something like that. And it was about the the base HTTP handlers, that kind of that really real core of the framework. And I was right. You know, I put up a few examples from different frameworks, like flask and node example and Starlet. You know, async, microframework, and Tom Christie, and then I, you know, showed how you might put that together in Django. 00:13:30 Yeah. For people who don't necessarily live and breathe the web stuff like the three of us do. @@ -76,17 +76,17 @@ 00:13:46 Yeah, okay. Sure. -00:13:47 I think the idea is that it's easier to contrast it with like rails or Django or rails and Django, they come with the batteries included with everything I need, you know, you've got euro RM or active regularly, +00:13:47 I think the idea is that it's easier to contrast it with like rails or Django or rails and Django, they come with the batteries included with everything I need, you know, you've got ORM or active regularly, 00:13:59 you've got an O RM, you've got database, you've got migrations, -00:14:03 all this stuff was you microframework, you perhaps get the call HTTP handling. And that's about it. And then you have to go and find a forms like, okay, I found a forms library, and then you have to pull in an ORM, or a database of that. Okay, I got I'll use that. And you know, so there are there are node full batteries included frameworks, like happy, it's very good. It's got everything you need. But like, the classic node example is, oh, you know, I get Express, and then I pull it in this, this thing that has seen URLs and that thing, you know, so there's a continuum. But Agra frameworks fit more towards that, you know, you put the pieces together yourself, whereas a batteries included framework like Django, you get not everything but a lot in the branch. +00:14:03 all this stuff was you microframework, you perhaps get the call HTTP handling. And that's about it. And then you have to go and find a forms like, okay, I found a forms library, and then you have to pull in an ORM, or a database of that. Okay, I got I'll use that. And you know, so there are there are node full batteries included frameworks, like happy, it's very good. It's got everything you need. But like, the classic node example is, oh, you know, I get Express, and then I pull it in this, this thing that has seen URLs and that thing, you know, so there's a continuum. But Micro frameworks fit more towards that, you know, you put the pieces together yourself, whereas a batteries included framework like Django, you get not everything but a lot in the branch. 00:14:40 Right. But your your experience is more Django pieces, Django building blocks, then a little flask, SQL alchemy a little this little that definitely. And I've been doing it so long now that I struggle to break out of that. I 00:14:52 just, you know, I throw in a bit of starlet or a bit of fast API, see what's going on with the new frameworks, every new framework that comes out but always give it a run. And then my question is okay, well, what can we learn from that? -00:15:04 This portion of talk Python to me is brought to you by square. Do you run or want to build a web application that sells products or services. Building a successful online business is the stuff that dreams are made of. But accepting payments and handling credit cards with all the various regulations is a common stumbling block. But not with squares payment API, or square your Python web app can easily take payments, you'll seamlessly be able to accept debit and credit cards as well as square gift cards. Let your users pay with their digital wallets square works with Apple Pay, Google pay and masterpass. All in one go. Through API includes PCI compliance, end to end encryption, dispute management and fraud detection, build your online payment form in three steps with square payment SDKs. Step one, create a new square API application. Step two, add squares payment form to your checkout flow. Step three, use squares API to charge the card to get started building your business on square just visit talk python.fm slash square or click the link in your podcast player shownotes. That's talk python.fm slash square. +00:15:04 This portion of talk Python to me is brought to you by 'Square'. Do you run or want to build a web application that sells products or services. Building a successful online business is the stuff that dreams are made of. But accepting payments and handling credit cards with all the various regulations is a common stumbling block. But not with squares payment API, or Square your Python web app can easily take payments, you'll seamlessly be able to accept debit and credit cards as well as square gift cards. Let your users pay with their digital wallets Square works with Apple Pay, Google pay and masterpass. All in one go. Your API includes PCI compliance, end to end encryption, dispute management and fraud detection, build your online payment form in three steps with square payment SDKs. Step one, create a new square API application. Step two, add squares payment form to your checkout flow. Step three, use squares API to charge the card to get started building your business on square just visit 'talkpython.fm/square' or click the link in your podcast player shownotes. That's talk python.fm/square. 00:16:07 really large site, if you ask them, like what language and frameworks you use, it's like everything, just because of the needs of a massive site, it ends up being it's hard to say that it's truly one thing I mean, Instagram is still has pieces of Django in it. But you know, it's at that scale. It's its own thing. And I think you know, the micro frameworks are really good for doing that this might be a pessimistic take. But I've heard people make that micro framework allows you to switch the complexity from individual developers who may be cheering you just as much up to your upper level, because they only can touch a tiny part of the monolith. So they can in a way do less damage. That's actually an organizational argument for microframework. Because it only this software architects or the senior people @@ -102,7 +102,7 @@ 00:18:28 Yeah. Because the people who are knowledgeable are writing about their day job. They're not writing about spinning up a Django app in a weekend, which is possible, and you can deploy it to which we could. Should we get into that deployment? I -00:18:39 do want to just point out, I think this might be the article that I read before, like the Yeah, you you're not Google, you're not LinkedIn, you're not Facebook, you're not Netflix, there are people who actually that's an inaccurate statement about, there's plenty of people that work at those companies. But like you said, Carlton, the people who come along, they see these companies who they respect and say they must be doing it, right. And so often with these deployment stories with these design patterns, you know, should you have like separate caching servers that run like something like Redis? Yes, you should? No, you shouldn't? And that's the right answer at the same time, but with the context that you need, right? Are you trying to run 10,000 servers and let 500 people work on this project? Or are you two people trying to do a startup like those are not the same trade offs and balances you want to make? Right? +00:18:39 do want to just point out, I think this might be the article that I read before, like the Yeah, you you're not Google, you're not LinkedIn, you're not Facebook, you're not Netflix, there are people who actually that's an in accurate statement about, there's plenty of people that work at those companies. But like you said, Carlton, the people who come along, they see these companies who they respect and say they must be doing it, right. And so often with these deployment stories with these design patterns, you know, should you have like separate caching servers that run like something like Redis? Yes, you should? No, you shouldn't? And that's the right answer at the same time, but with the context that you need, right? Are you trying to run 10,000 servers and let 500 people work on this project? Or are you two people trying to do a startup like those are not the same trade offs and balances you want to make? Right? 00:19:25 Yeah, and it depends as well on your model. So one, one example that I really like is Stack Overflow. Now they're built entirely on the Microsoft stack. They've got SQL Server, and they're using dotnet. And but the basic point is they've got a really big database one, and then you know, a few worker processes in front of those. And that's it. It's a classic monolith, and then one of the biggest sites on the internet and yet they're incredibly fast. Yeah, exactly as fast as you could ever want. Okay, they do it in this kind of monolith old school style, you know, vertical scale, make it bigger at night, don't spin out. parallels, just make the thing bigger by a bigger Is database service. So that's you're not going to be bigger than Stack Overflow, right? And then another example we have when you are you will reevaluate it anyway, what @@ -126,7 +126,7 @@ 00:22:09 but three introduced it three began the process for making Django async. Right, so we added. So historically, Python has this whiskey, the web service gateway interface. So Django is a whiskey framework. flask is a whiskey framework, it was this standard so that application servers could talk to protocol servers, which could talk to the internet without, you know, each framework having to have its own protocol server. So ganache right is a whiskey server, and it can speak to flask, and it can speak to Django, and it can speak to any other whiskey server, -00:22:39 right. You can use that and I can use micro whiskey. And yeah, it's we don't have to coordinate or do anything. It just happens that yes, of this Ws gr whiskey thing, right. Yeah. So +00:22:39 right. You can use that and I can use micro whiskey. And yeah, it's we don't have to coordinate or do anything. It just happens that yes, of this WSGR whiskey thing, right. Yeah. So 00:22:49 that's the standard. And so in order to make things async, there's this thing called ASCII, which is the asynchronous web gateway interface. Oh, @@ -134,9 +134,9 @@ 00:23:01 first of all, Django 3.0, brought in an ASCII handler, so it wasn't async at all, but you could run it under an ASCII server. And then 3.1 brought in async, actual async views. And you can define an async def view. And you can use I don't know HTTP x, which is like an asynchronous HTTP client with a request. Yeah, it's like requests, but async. -00:23:23 Right, or, what's that? Oh, RM story. Does that support async? await yet, +00:23:23 Right, or, what's that? ORM story. Does that support async? await yet, -00:23:27 right? No. So that's not there yet. So this is something that will develop over probably over the course of the for Django for lifestyle. So there's plans and there's thoughts. And we need to get to the point where you with the ORM, where you can down to your kind of filter call. And so things like filter, they can be totally synchronous, because they don't actually make do any IO, they don't actually hit the database. But when you then go, like, I've got my query set, and I'm going to iterate it, and I'm going to fetch the objects, database, we need that bit, even if the actual connection is running a thread or whatever, we need that bit to be fully async. And then Django will will feel async as it is, at the moment, if you write an async view in Django, you kind of have to say, Well, I'm not going to touch the DB, I'm not going to you can wrap the ORM in a in a sink to a sink, wrap up, but you kind of lose the point of that. +00:23:27 right? No. So that's not there yet. So this is something that will develop over probably over the course of the for Django 4 style. So there's plans and there's thoughts. And we need to get to the point where you with the ORM, where you can down to your kind of filter call. And so things like filter, they can be totally synchronous, because they don't actually make do any IO, they don't actually hit the database. But when you then go, like, I've got my query set, and I'm going to iterate it, and I'm going to fetch the objects, database, we need that bit, even if the actual connection is running a thread or whatever, we need that bit to be fully async. And then Django will will feel async as it is, at the moment, if you write an async view in Django, you kind of have to say, Well, I'm not going to touch the DB, I'm not going to you can wrap the ORM in a in a sync to async, wrap up, but you kind of lose the point of that. 00:24:17 Yeah, but that's coming. Right. Yeah. @@ -146,25 +146,25 @@ 00:25:00 so without and this works out what's amazing about the way Andrew is just absolute hero, but he put it together in such a way that you can run this with a whiskey service, you're running Django 3.1. With Garner cotton like you always have, and you think yourself, I just need to make a couple of API calls, but they're quite slow. And I want to make them in parallel, you can do that just by writing an async def call, and then, you know, making the parallel calls using their HTTP xa. And it just works. And Django does all the rest for you and adapt it and you didn't have to change your application server, you didn't have to do you know it just yet, right. -00:25:34 But eating just that one of the challenges, if you're talking to the database is usually Okay, find the bottleneck was we're waiting at the web server level, at the NGO level, and then we're gonna push that down to the database, we're gonna just make all that async, all of a sudden, all the pressures now on the database, which can be a problem. But if you're talking to external API's, you're now pushing the pressure onto the internet, which is a way scalable. +00:25:34 But eating just that one of the challenges, if you're talking to the database is usually Okay, find the bottleneck was we're waiting at the web server level, at the Django level, and then we're gonna push that down to the database, we're gonna just make all that async, all of a sudden, all the pressures now on the database, which can be a problem. But if you're talking to external API's, you're now pushing the pressure onto the internet, which is a way scalable. 00:25:57 Yeah. And it's always going to scale more than you, right? You're not going to need that many client requests, probably. 00:26:03 Yeah. Also, David Smith out there says 3.2 will be a great release, you guys have done a great job getting many patches. Oh, yeah, for sure. -00:26:09 Thank you, Dave. Super, we have we're working really hard it was we had too long a list and we got you know, one, we had to bump for Django 4.0. But the rest we got in so we were very Okay, so three Dotto. async and await This is a big deal. Yeah. And so those are the big features for 3.0. And then what's coming in 3.2, it's got various other bits, bits and bobs, you can customize the primary key. So traditionally, they've just been auto filled, which is in 32, well, eventually, you know, you get 22 billion of those or something you can run out. So you can now customize that for big and over the next couple of releases, you know, major releases, we will make the default begin, because that's probably what it should be in 64, because then you're never going to run out of primary keys. But that's something that the big sites run into that, especially if you start creating, I don't know, event, an event log, yeah, you know, a site can generate a lot of events, and they can add up quickly, you know, so, again, functional index is in the in the ORM. So you can create an index on an expression like that these were greater than or that the sum of this was and then you can query on those fullspeed. Because their index, that's a really big feature, +00:26:09 Thank you, Dave. Super, we have we're working really hard it was we had too long a list and we got you know, one, we had to bump for Django 4.0. But the rest we got in so we were very Okay, so 3.0 async and await This is a big deal. Yeah. And so those are the big features for 3.0. And then what's coming in 3.2, it's got various other bits, bits and bobs, you can customize the primary key. So traditionally, they've just been auto filled, which is in 32, well, eventually, you know, you get 22 billion of those or something you can run out. So you can now customize that for big and over the next couple of releases, you know, major releases, we will make the default begin, because that's probably what it should be in 64, because then you're never going to run out of primary keys. But that's something that the big sites run into that, especially if you start creating, I don't know, event, an event log, yeah, you know, a site can generate a lot of events, and they can add up quickly, you know, so, again, functional index is in the in the ORM. So you can create an index on an expression like that these were greater than or that the sum of this was and then you can query on those fullspeed. Because their index, that's a really big feature, 00:27:16 oh, really. So you can do a query like the sum of the orders of this, the customer is greater than $100. 00:27:22 And that's a relationship. Yeah. And then you can create an index on that value. And you've been able to do that in your SQL for you know, any amount of time. But that's now exposed at the ORM level. And that's, that's awesome. -00:27:32 can just go that for people who are doing reporting or or that kind of thing. Yeah, that's massive. This portion of talk by phenom a sponsored by linode. Simplify your infrastructure and cut your cloud bills in half with linode. Linux virtual machines, develop, deploy and scale your modern applications faster and easier. Whether you're developing a personal project or managing large workloads, you deserve simple, affordable and accessible cloud computing solutions. As listeners of talk Python to me, you'll get a $100 free credit, you can find all the details at talk python.fm slash linode. linode has data centers around the world with the same simple and consistent pricing regardless of location, just choose the data center that's nearest to your users, you also receive 20 473 65 human support with no tears or handoffs, regardless of your plan size. You can choose shared and dedicated compute instances. Or you can use your $100 in credit on s3, compatible object storage, managed Kubernetes clusters. And more. If it runs on Linux, it runs on the node, visit talk python.fm slash linode or click the link in your show notes and click that create free account button to get started. +00:27:32 can just go that for people who are doing reporting or or that kind of thing. Yeah, that's massive. This portion of talk python to me sponsored by Linode. Simplify your infrastructure and cut your cloud bills in half with Linode's Linux virtual machines, develop, deploy and scale your modern applications faster and easier. Whether you're developing a personal project or managing large workloads, you deserve simple, affordable and accessible cloud computing solutions. As listeners of talk Python to me, you'll get a $100 free credit, you can find all the details at 'talkpython.fm/linode'. Linode has data centers around the world with the same simple and consistent pricing regardless of location, just choose the data center that's nearest to your users, you also receive 24/7 365 human support with no tears or handoffs, regardless of your plan size. You can choose shared and dedicated compute instances. Or you can use your $100 in credit on s3, compatible object storage, managed Kubernetes clusters. And more. If it runs on Linux, it runs on the node, visit 'talkpython.fm/linode' or click the link in your show notes and click that create free account button to get started. 00:28:45 And then we've got I don't know a new memcached back end for the cache. We've got you know updating the the API for using the admin for nice decorator API for creating various admin customizations. We've got themes in the admin going to ship a dark theme is a big one for -00:29:00 April see right away and is to because yet the Django admin could use a refresh and there's been all sorts of third party ways to customize it but now be built in +00:29:00 People see right away and is to because yet the Django admin could use a refresh and there's been all sorts of third party ways to customize it but now be built in -00:29:08 there's all sorts in the in the blog post for the arthritis. I described it as a mess Berg new features in that's exactly what it is it but what what's nice is it's not there aren't apart from async, which is coming. There aren't any new major features in Django, right? It's 15 years old. It's rich and mature and features largely feature complete. But each major release each eight months isn't it always amazes me when we're drawing to get up the final release notes, how much depth and extra like substance there is in the features that have managed to add and that have been contributed over that eight month beer as fantastic. +00:29:08 there's all sorts in the in the blog post for the arthritis. I described it as a mess Berg new features in that's exactly what it is it but what what's nice is it's not there aren't apart from async, which is coming. There aren't any new major features in Django, right? It's 15 years old. It's rich and mature and features largely feature complete. But each major release each eight months isn't it always amazes me when we're drawing to get up the final release notes, how much depth and extra like substance there is in the features that have managed to add and that have been contributed over that eight month been as fantastic. 00:29:41 So I feel like you're capturing much of the modern Python awesomeness. One other area. That's, I think, interesting. I have no idea what your plans are. But maybe you could just give us your thoughts is type hint type annotations. Yeah. @@ -172,7 +172,7 @@ 00:29:57 Carl about that a little bit. -00:29:58 Yeah. I mean, that's difficult for Because Django I mean, if you take your it's super dynamic, and there's an amazing project, don't Django stubs where they've got stub files, and they've got your nice on a super job. And it really does work well. So if you load up VS code or Python these days, you get very good autocomplete around things like Django model field definitions. And that's all powered by Django. Yeah. Which probably is powered by type hints underneath, right? Yeah, yeah, it totally is. a year or so ago, there was some discussion about whether we would make those inline hints in Django itself. And the technical board at the time said, No, we need the typing technology to evolve a little bit further in, in Python itself. Before Django can jump on a jump on and say, Look, we endorsed this particular technology, there is how there's various ways of doing it, and various type hinting type checkers and various and so so we favored my PI, which is obviously the endorsed one. But what about the others? We can't support them all. And they might change in pay rate and all the different various initiatives. Yeah, we can't take it out once it's in the thing we have. The reason why people love Django is because it's super stable. You know, you write you write a site five years later, you don't really have to do very much to keep it going. And so the technical board at the time said, No, we're not going to bring those type hints into Django, but that will be reviewed. You know, we'll look at it again in a little while time. One thing I will say about typing in all the peps, it says, type into the remain optional. They're not meant to be compulsory, even by convention. I feel maintaining Django a certain pressure to have them I do worry that that optionality of type hints is perhaps undermined a little bit in the day to day development. I'm not sure. I also code in Swift, right, which is only statically. typed +00:29:58 Yeah. I mean, that's difficult for Because Django I mean, if you take your it's super dynamic, and there's an amazing project, .Django stubs where they've got stub files, and they've got your nice on a super job. And it really does work well. So if you load up VS code or PyCharm these days, you get very good autocomplete around things like Django model field definitions. And that's all powered by Django. Yeah. Which probably is powered by type hints underneath, right? Yeah, yeah, it totally is. a year or so ago, there was some discussion about whether we would make those inline hints in Django itself. And the technical board at the time said, No, we need the typing technology to evolve a little bit further in, in Python itself. Before Django can jump on a jump on and say, Look, we endorsed this particular technology, there is how there's various ways of doing it, and various type hinting type checkers and various and so so we favored my PI, which is obviously the endorsed one. But what about the others? We can't support them all. And they might change in pay rate and all the different various initiatives. Yeah, we can't take it out once it's in the thing we have. The reason why people love Django is because it's super stable. You know, you write you write a site five years later, you don't really have to do very much to keep it going. And so the technical board at the time said, No, we're not going to bring those type hints into Django, but that will be reviewed. You know, we'll look at it again in a little while time. One thing I will say about typing in all the peps, it says, type into the remain optional. They're not meant to be compulsory, even by convention. I feel maintaining Django a certain pressure to have them I do worry that that optionality of type hints is perhaps undermined a little bit in the day to day development. I'm not sure. I also code in Swift, right, which is only statically. typed 00:31:38 Swift is pretty strong in its syntax, like you can't even know out yet. Even if they could be right. If there's not allowed to be optional, explicit. There's a lot of interesting stuff happening the swiftype @@ -182,17 +182,17 @@ 00:32:51 Yeah. And Django is in that very much in that boat where it was never written with type hints in mind. Yeah, of course they didn't. And if you look at Django stubs, it's awesome. But a lot of things are like string or any. And it's like, that's horrible, right? You don't want to write that every single time. That's an issue. And then another thing I see, which I'm not sure about, sometimes they make a type like a model, the Django model admin, they make it a generic that takes takes the model type, which tells you what it's the admin of, and I remember, and I look at that, and I went a little bit like back to c++ templates or something, right? Yeah, yeah. And I see where they're going with that. It's amazing what they've been able to do that. But I remember coding Objective C, UI TableView. sec. And what you would always do is you get object, your row class in from the UI TableView delegate, where you getting a list of sales, right? So like a list of email messages, and you'd immediately cast it to the type you wanted, so that you knew what you were dealing with. Now, in a way, when I get something back from a model admin, -00:33:46 I'm quite happy to write colon model name. Yeah, to tell the the editor what it's meant to be to tell the type checker what it's meant to be. And then for the rest of the method, I get the autocomplete, I get the type checking, and you probably don't put any more type hints because it's just now float. And it's Yeah, I agree. It's so good at General, like allowing you to generate code incredibly fast if the tool supports it. One quick comment or thought that before we move on this having typed around the ORM is really interesting, because all the big rmws seem to do the same basic flow, you know, I'm thinking, you know, Django ORM, SQL alchemy, the even no SQL ones, like Mongo engine, which I use, they all have a class, which has the columns, defined as descriptors. So at definition time they create the tables are the collections, and then at runtime, they become the scalar versions of the thing. They say they are right integer column is now actually an integer. So what I've done when I define the model class is say like, name, colon, stir equals string column. And so the Python thinks it's all the types are actually what I said, the primary types, like in the model, and then the underlying ORM can do what it needs to do. But you know, the rest of my code is like, Oh, that's a blue column right there or flowfield That's been really helpful. I don't know. Yeah, I haven't run into any problems doing that. But it's been pretty helpful. +00:33:46 I'm quite happy to write colon model name. Yeah, to tell the the editor what it's meant to be to tell the type checker what it's meant to be. And then for the rest of the method, I get the autocomplete, I get the type checking, and you probably don't put any more type hints because it's just now float. And it's Yeah, I agree. It's so good at General, like allowing you to generate code incredibly fast if the tool supports it. One quick comment or thought that before we move on this having typed around the ORM is really interesting, because all the big ORM's seem to do the same basic flow, you know, I'm thinking, you know, Django ORM, SQL alchemy, the even no SQL ones, like Mongo engine, which I use, they all have a class, which has the columns, defined as descriptors. So at definition time they create the tables are the collections, and then at runtime, they become the scalar versions of the thing. They say they are right integer column is now actually an integer. So what I've done when I define the model class is say like, name, colon, stir equals string column. And so the Python thinks it's all the types are actually what I said, the primary types, like in the model, and then the underlying ORM can do what it needs to do. But you know, the rest of my code is like, Oh, that's a blue column right there or flowfield That's been really helpful. I don't know. Yeah, I haven't run into any problems doing that. But it's been pretty helpful. -00:35:05 Yeah, I mean, it's the thing that that brings money to it. Well, there's data classes, right, which they bought in three, seven or three, eight. Yeah. And then pay downtick is exciting thing, which is identical. Fantastic. Yeah, you define a model, call it a model. With exactly this. It's like field name, type, with a annotation, string, or int or whatever. And that works really well. And I can imagine us generating Django ORM models from that kind of thing. But then you end up needing options, like, Is it nullable? Or is it Yeah, you know, is it required? Is it what validators does this field have? And then it starts when you start to add validation, it starts to look almost like what we've got now. I mean, pie, Dante's got one good advantage in raw serialization speed, it's very fast. So that's something we can learn. But the short answer is, yeah, I don't know. But it's exciting times, isn't it? Very exciting times. +00:35:05 Yeah, I mean, it's the thing that that brings money to it. Well, there's data classes, right, which they bought in 3.7 or 3.8. Yeah. And then Pydantic is exciting thing, which is Pydantic. Fantastic. Yeah, you define a model, call it a model. With exactly this. It's like field name, type, with a annotation, string, or int or whatever. And that works really well. And I can imagine us generating Django ORM models from that kind of thing. But then you end up needing options, like, Is it nullable? Or is it Yeah, you know, is it required? Is it what validators does this field have? And then it starts when you start to add validation, it starts to look almost like what we've got now. I mean, Pydantic got one good advantage in raw serialization speed, it's very fast. So that's something we can learn. But the short answer is, yeah, I don't know. But it's exciting times, isn't it? Very exciting times. -00:35:55 All right. Let's talk really quickly about maintaining your your content or things that you generate, like you guys all write, articles, books, do online stuff, even podcasts. So three, two is out what broke like real quickly? Well, maybe you want to just talk about what you've been doing to, you know, +00:35:55 All right. Let's talk really quickly about maintaining your your content or things that you generate, like you guys all write, articles, books, do online stuff, even podcasts. So 3.2 is out what broke like real quickly? Well, maybe you want to just talk about what you've been doing to, you know, -00:36:13 yeah, there's a really up to date Django books, because the release cycle of every nine months doesn't overlap with the traditional publisher cycle. So for me, I mean, I'm on version, I think, five for most of the books. So yeah, I was on 110, or 111. So I've, it's sort of like writing the book again, every time I basically go through from scratch. And because all the code is linked to the text, because I'm self published, I can do all that I have my flow down. So really, it's I have a kind of a list of new features, I know that are in there. So I'll play around with them to make sure that things don't break. And then I'm constantly emailing with readers. So I have feedback on kind of what works and what doesn't. So half the new features are fixing things. So it's smooth, and then half just making the text. So at this point, I feel pretty good about the flow of all three. And people say what about this, and maybe I can explain a little bit better. I added, I mean, so for three one, I added in the beginners book, I really wanted to have proper deployments, but not go as deep as I do in the profession. So I did introduce environment variables, and showed, I think, a pretty elegant way environment variables and to have some lockdown for deployed site. Whereas for beginners, it was just sort of like the local version, which is pretty thin, was what you deployed, because it was more about getting something up. But I was able to introduce environment variables, we could talk about that more, there's a third party packages. Great. So so that's always the tension for me is showing and telling, right, like I will, I will explain everything, but I don't want to overwhelm people. And so that's part of the thing I just think about for three one, +00:36:13 yeah, there's a really up to date Django books, because the release cycle of every nine months doesn't overlap with the traditional publisher cycle. So for me, I mean, I'm on version, I think, five for most of the books. So yeah, I was on 110, or 111. So I've, it's sort of like writing the book again, every time I basically go through from scratch. And because all the code is linked to the text, because I'm self published, I can do all that I have my flow down. So really, it's I have a kind of a list of new features, I know that are in there. So I'll play around with them to make sure that things don't break. And then I'm constantly emailing with readers. So I have feedback on kind of what works and what doesn't. So half the new features are fixing things. So it's smooth, and then half just making the text. So at this point, I feel pretty good about the flow of all three. And people say what about this, and maybe I can explain a little bit better. I added, I mean, so for three one, I added in the beginners book, I really wanted to have proper deployments, but not go as deep as I do in the profession. So I did introduce environment variables, and showed, I think, a pretty elegant way environment variables and to have some lockdown for deployed site. Whereas for beginners, it was just sort of like the local version, which is pretty thin, was what you deployed, because it was more about getting something up. But I was able to introduce environment variables, we could talk about that more, there's a third party packages. Great. So so that's always the tension for me is showing and telling, right, like I will, I will explain everything, but I don't want to overwhelm people. And so that's part of the thing I just think about for 3.1, 00:37:43 I think also for Django being so stable, at least you can say, all this stuff is still the same, there's just new features we haven't mentioned or a new way, maybe it's better, but as opposed to this does not work anymore. -00:37:54 Well, it's the difference between what trips, the professional programmer, and someone who's new. So for example, in three, one, the pathless was added when you create a new Django project, the settings.py file, which is the default settings, the way the routes are done is a little bit different. A minute StackOverflow thing for someone who's used to using Django or breaking stuff, but that can completely derail a beginner, there's usually a couple things like that, you know, so in that case, like I have a dedicated blog post, because I knew that was coming. And just this morning, I got more questions around it. So it's that sort of the the thing that funny is, how do I, you know, be compassionate to the true beginners, but +00:37:54 Well, it's the difference between what trips, the professional programmer, and someone who's new. So for example, in 3.1 the path was added when you create a new Django project, the settings.py file, which is the default settings, the way the routes are done is a little bit different. A minute StackOverflow thing for someone who's used to using Django or breaking stuff, but that can completely derail a beginner, there's usually a couple things like that, you know, so in that case, like I have a dedicated blog post, because I knew that was coming. And just this morning, I got more questions around it. So it's that sort of the the thing that funny is, how do I, you know, be compassionate to the true beginners, but 00:38:31 we're more @@ -208,13 +208,13 @@ 00:40:08 Yeah, among other things. I mean, two things, right. I think I think like when I think about others, there's two ways to think about one is making a movement friendly, which we can talk about. And then there's the specifics of do I, you know, do I do it on a VPS? Do I do it on a pass? So I there's two separate things. The first one, I think we can generalize, and Django has some good notes on that. The second one is sort of a deep end of opinion, -00:40:32 I guess I'm in favor of like I use, I'm a big fan of platforms. If somebody had something like AWS for Django, I would use that. I mean, so I'm just going to throw it in here. Now I'm working on an app for this exact problem. It's called button is btn dot Dev. And it's the idea is to be able to spin up a small environment for this exact I just want to get my hat on nine use case without, you know, you're having to become a sysadmin. Expert. Yeah. +00:40:32 I guess I'm in favor of like I use, I'm a big fan of platforms. If somebody had something like AWS for Django, I would use that. I mean, so I'm just going to throw it in here. Now I'm working on an app for this exact problem. It's called button is 'btn.dev'. And it's the idea is to be able to spin up a small environment for this exact I just want to get my hat on nine use case without, you know, you're having to become a sysadmin. Expert. Yeah. 00:41:00 So the spectrum looks like to me pass right platform as a service, which you said you like Heroku. And there's other options as well. Yeah. Heroku digitalocean, then I think maybe button is even before that. I'm not entirely sure. Like in terms of complexity, 00:41:15 let me jump in. You've got the platform to serve. You got Hiroko, which is there at one end. And that's kind of like they'll run your app. But then yeah, that's it right? Or the you know, you can run another worker instance. But then you say to yourself, I want to I just want to put a file online. It's like, well, how on earth do I do that? I can't, or I want to put up a static website. So I've got to bring in I know netlify, because I another service. And -00:41:36 yeah, nella phi is really nice for just pure static sites, actually. Yeah, +00:41:36 yeah, 'Netlify' is really nice for just pure static sites, actually. Yeah, 00:41:40 it is. It's awesome. But then all of a sudden, you're running two services, and then you want to do some log analysis. So you've got to hang on, I've got three services now running, and it's, it gets a bit out of hand. So then on the other hand, you've got AWS, and you go to AWS. And it's like, oh, my word. You know, it's the Paradox of Choice. It's just like that in Azure. @@ -228,7 +228,7 @@ 00:43:10 yeah, that sounds really cool. We'll -00:43:12 definitely link to it in the show notes. When I was pausing there. I said, Okay, well, on one end, we have past like Heroku and other platform as a service options. What is the other end is the other end Linux virtual machines, or the other end Kubernetes clusters? It's Kubernetes costs, it is well beyond virtual machines, right? So like to spin up a virtual machine is not a problem. The problem is that you get a Barrow s, and then you've got to do all the apt get install, turn it into something you can deploy on, right? And then you do that once. And then the second problem is that six months later, you need to upgrade it, and you've got to replace that VM. So that's very difficult. And I think that's what leads people into this kind of world of containers. And then into containers. It's like a wall I've got I need an orchestration platform to and that's there are people who make a career doing that. Yeah. You can't expect to do that sensibly. +00:43:12 definitely link to it in the show notes. When I was pausing there. I said, Okay, well, on one end, we have PaaS like Heroku and other platform as a service options. What is the other end is the other end Linux virtual machines, or the other end Kubernetes clusters? It's Kubernetes clusters, it is well beyond virtual machines, right? So like to spin up a virtual machine is not a problem. The problem is that you get a Barrow s, and then you've got to do all the apt get install, turn it into something you can deploy on, right? And then you do that once. And then the second problem is that six months later, you need to upgrade it, and you've got to replace that VM. So that's very difficult. And I think that's what leads people into this kind of world of containers. And then into containers. It's like a wall I've got I need an orchestration platform to and that's there are people who make a career doing that. Yeah. You can't expect to do that sensibly. 00:44:06 I agree with Carlton. It depends slightly and that like so in my professionals book, we I show how to use Docker. And Docker is for a smaller site, I think, actually, you can put your containers online. So you can a lot of different flow is actually quite nice. But I agree with Carlton's broader point about, you know, patties and all the rest, it's @@ -252,7 +252,7 @@ 00:48:03 if your manager has felt that pain, individual contributor themselves, right, because if they haven't, then yeah, stop complaining developer, you know, yeah, you should, I always thought I mean, and I Quizlet, which is now like, half of all high school students, we had giant, which I always tell around their hosting company, but we're having scaling. They came in and we had the whole team CTO, which was great. And, you know, at the end of the day, they were just like, you need bigger hardware. You know, if you can get more, you know, as a business person, if you can throw money at a scaling problem, like, yeah, that's the easiest money you've spent. Yeah, -00:48:37 I think you know, so what I'm doing right now, Carlton is I'm running nginx, on a boon to but I'm using micro whiskey yoke. And I came across this, although, as I'm starting to think about the as gi stuff, doing acquiring g unicorn, and like UV corn worker processes is starting to look like a really awesome thing. But what I wanted to point out is there's this great article by the Bloomberg tech folks talking about configuring micro whiskey for production. And man, do they have a bunch of good little tips of like, Well, here's all these things. But how did you know if you turn on like enabled threads versus not or if you set single interpreter mode to true, you automatically get better performance because it was configured to potentially run like Python three, seven and three at the same time. And you're never going to do that in one process. Most likely, there's just all these really fantastic settings and like vacuum and whatnot to clean up sockets and whatnot. So people if they're running micros game production, they should absolutely check that out. That's a really, I wrote a bunch of the way I was doing things after reading that +00:48:37 I think you know, so what I'm doing right now, Carlton is I'm running nginx, on Ubuntu but I'm using micro whiskey yoke. And I came across this, although, as I'm starting to think about the as gi stuff, doing acquiring g unicorn, and like UV corn worker processes is starting to look like a really awesome thing. But what I wanted to point out is there's this great article by the Bloomberg tech folks talking about configuring micro whiskey for production. And man, do they have a bunch of good little tips of like, Well, here's all these things. But how did you know if you turn on like enabled threads versus not or if you set single interpreter mode to true, you automatically get better performance because it was configured to potentially run like Python3.7 and 3.8 at the same time. And you're never going to do that in one process. Most likely, there's just all these really fantastic settings and like vacuum and whatnot to clean up sockets and whatnot. So people if they're running micro whiskey game production, they should absolutely check that out. That's a really, I wrote a bunch of the way I was doing things after reading that 00:49:37 just scroll to the image at the top if you would, please, Michael. Yeah, like yeah, so that looks almost exactly how I deploy nginx in front of a worker process. You know, that's running your Python applications and you know, talking to your database. @@ -260,9 +260,9 @@ 00:49:52 it's pretty standard. Yeah, they've got memcached in there. So I'd have Redis instead of mem cache, but same deal. -00:49:57 Yeah, there's a comment about does EC to sit between Have a VPS and COVID. Yeah, COVID. Since your AWS expert, okay, well, +00:49:57 Yeah, there's a comment about does EC 2 sit between Have a VPS and Kuber. Yeah, Kubernetes. Since your AWS expert, okay, well, -00:50:06 I don't have my expert, but so but on AWS like you, this is why I haven't found in AWS, I would love to love their lightsail product, but it just seems totally uncared where they kind of manage the cluster all for you. But on the I can't even remember what the cluster services called. But the you have to provision the EC two instances. And you can do it quite easily. It's not hard to do, but you have to spin those up. And they're kind of your responsibility. And then it will run the cluster service on top of it, the Kubernetes on top of it, and then you deploy your containers in there. This is where something like cloud run is a bit easier, because you don't have that layer of I'm provisioning the underlying instances myself, and you have to choose the size of them. And you can scale them up easily enough. But you know, it's Yes, it's non trivial to set up a cluster on AWS, that's how I spend my phyllanthus. I also feel like you're sort of committed to their hosted services. And if for some reason, you don't want to use them, the step is not a tiny bit more, it's a bigger, much bigger step to like, try to rework that. Yeah, the example is running, you can run your own Postgres on EC two instances, and you know, you can provision the disks and you can handle that, rather than RDS. And for me, unlike Okay, if you've got a very specialist use case, then yeah, do that. But RDS is great. Just use RDS. Just use the hosted service, because, again, you're saving money. And yet it's a bit more expensive. Our but not as expensive as you your time, your life force. Yeah. +00:50:06 I don't have my expert, but so but on AWS like you, this is why I haven't found in AWS, I would love to love their lightsail product, but it just seems totally uncared where they kind of manage the cluster all for you. But on the I can't even remember what the cluster services called. But the you have to provision the EC2 instances. And you can do it quite easily. It's not hard to do, but you have to spin those up. And they're kind of your responsibility. And then it will run the cluster service on top of it, the Kubernetes on top of it, and then you deploy your containers in there. This is where something like cloud run is a bit easier, because you don't have that layer of I'm provisioning the underlying instances myself, and you have to choose the size of them. And you can scale them up easily enough. But you know, it's Yes, it's non trivial to set up a cluster on AWS, that's how I spend my phyllanthus. I also feel like you're sort of committed to their hosted services. And if for some reason, you don't want to use them, the step is not a tiny bit more, it's a bigger, much bigger step to like, try to rework that. Yeah, the example is running, you can run your own Postgres on EC2 instances, and you know, you can provision the disks and you can handle that, rather than RDS. And for me, unlike Okay, if you've got a very specialist use case, then yeah, do that. But RDS is great. Just use RDS. Just use the hosted service, because, again, you're saving money. And yet it's a bit more expensive hour but not as expensive as you your time, your life force. Yeah. 00:51:29 Well, I think also, it depends, right? Are you one single person who is building up an idea and you have zero revenue, and you're just trying to do this for two hours a week to see if you can get a little tiny bit of traction, maybe it is worth an extra five hours, right? versus I have actual money coming in the door. And I'm not building this feature, or that feature? Like those are really very different contexts. Also, though, you @@ -276,15 +276,15 @@ 00:53:16 Yeah, I just had him on the show. His episode just came out this week, actually, yeah, he's great. And that is such a cool project. I feel like that is actually an undersold story for the beginners, right? Because deployment of web application often means deployment of web application plus database server plus backups of database server plus, like all that, like all of a sudden it goes from, you know, 20% to 90%, hard or whatever, right? Or you go to hosted, we're like, Okay, well, I -00:53:44 still got a backup that thing, potentially. And, you know, there's just you've, you've got potential migrations, I know, there's a lot of stuff going on. And if you can say, well, until you actually get much traffic, you can just say, here's the DB file and the sequel lite connection on the one machine that I have, and you backup that file every now and then that might be a good story. If the alternative is it's too much for me to get my app out. Yeah, no, definitely, especially if it's a blog or a content site, like a shell catalog where it's all read only. And you know, maybe you're using the admin to manage the content on it. If it's only one person using the admin, you're not going to have concurrent rights, which is the thing about sequel Lite. And so it's never going to be an issue and it is fast. It's a read read only workload. It's fast. It's fast enough. +00:53:44 still got a backup that thing, potentially. And, you know, there's just you've, you've got potential migrations, I know, there's a lot of stuff going on. And if you can say, well, until you actually get much traffic, you can just say, here's the DB file and the SQLite connection on the one machine that I have, and you backup that file every now and then that might be a good story. If the alternative is it's too much for me to get my app out. Yeah, no, definitely, especially if it's a blog or a content site, like a shell catalog where it's all read only. And you know, maybe you're using the admin to manage the content on it. If it's only one person using the admin, you're not going to have concurrent rights, which is the thing about SQL Lite. And so it's never going to be an issue and it is fast. It's a read read only workload. It's fast. It's fast enough. 00:54:27 It's Yeah, it's very fast. Yeah, you can use and there's no server for people who don't understand are not totally aware. Like it comes with Python. And it runs in process. There's no other server to set up and connect to. Yeah, -00:54:38 so Carl did not at one point because I was saying Oh, yeah, NO SEQUEL light and he was so I have to give credit to Carlton for there are some instances +00:54:38 so Carl did not at one point because I was saying Oh, yeah, NO SQL lite and he was so I have to give credit to Carlton for there are some instances -00:54:46 where he's the secret, sequel light. It's got a writer headlock by a wow mode, which means that actually you've got a good chance of being able to do kind of concurrent writes to Django is ORM has got it retry value, which if you set that, you know, you know, a little bit higher, if it gets database locked, it will try again in a second. And then, you know, you can go quite a long way enabling these things. And then when you finally in production, actually get a database was locked error. And you could think, you know what, I think it's time we moved to post. Yeah, +00:54:46 where he's the secret, SQLite. It's got a writer headlock by a wow mode, which means that actually you've got a good chance of being able to do kind of concurrent writes to Django's ORM has got it retry value, which if you set that, you know, you know, a little bit higher, if it gets database locked, it will try again in a second. And then, you know, you can go quite a long way enabling these things. And then when you finally in production, actually get a database was locked error. And you could think, you know what, I think it's time we moved to post. Yeah, -00:55:18 in my world, I'm running MongoDB as the database and it, it doesn't make sense to consider running that on sequel Lite. But I can remember back when I first started deployment, like okay, well, I got to learn Linux, I got to learn nginx I gotta learn micro whiskey like, Okay, what other deployment like, how do I learn running my database on these things, there's just, there was a lot. And I can definitely see if you could say, here's an intermediate step to get it out and get it going. And then you just change the connection string, at some point over to a big, separate server. I think that's a really good path. +00:55:18 in my world, I'm running MongoDB as the database and it, it doesn't make sense to consider running that on SQL Lite. But I can remember back when I first started deployment, like okay, well, I got to learn Linux, I got to learn nginx I gotta learn micro whiskey like, Okay, what other deployment like, how do I learn running my database on these things, there's just, there was a lot. And I can definitely see if you could say, here's an intermediate step to get it out and get it going. And then you just change the connection string, at some point over to a big, separate server. I think that's a really good path. 00:55:47 It's a little bit like, you know, Roth's equal versus Django models. It's a timeline because I've had a thread emailing with a reader who doesn't quite it was has something in mind. And he's like, I modeled it all. The scheme all out and SQL, and SQL is easy to learn the basics, and really hard to scale. And while Jane Lynch SQL, you really, really, really should resist doing that, unless you're way better programmer than I am. Yeah. So whereas before, you would have to learn tons of code. Same thing with deployment, you can get a lot of the way there by trusting someone who says them until you need it. Don't bother the same thing with database stuff. Okay, you know, do some basic SQL, understand a little bit of relations, but that's the power of the Django ORM is that it will handle so much of this for you. And if you think you need to do custom, unless, you know if you have any doubts, you shouldn't do it. @@ -298,13 +298,13 @@ 00:57:29 Yeah, yeah, absolutely. And once I kind of dialed in now, it's more like, well, somebody tried to send like some binary hacky thing that broke the URL parsing, but it's not really a problem. It's just great. But having that thing there is is really important, I think, yes, absolutely. Absolutely. -00:57:45 century I, I just can't see a reason not to use it, even if they've got the free plan, why we have that +00:57:45 Century I, I just can't see a reason not to use it, even if they've got the free plan, why we have that 00:57:51 as a free tier, like, once you run out, you can decide whether or not you want to pay for it. But yeah, at least can get us some kind of insight into the errors that are happening. -00:57:59 Yeah, the other thing that I found is promethium takes a teeny bit of setting up. But again, you can get yourself a Raspberry Pi. And you can play with that locally. And it's not that hard, you could put it on your Mac, or you put it on your Windows machine, it will run but get a Raspberry Pi set up monitor the device, Prometheus is good fun. There's a dashboard that goes with it called refiner. And then they do a log thing called lucky, which you know, you have to configure it yourself, but it teaches you about your application, you have to say, Okay, I want to monitor this. And then you can put a little couple of tags in, and then Okay, you get metrics down for that bit. So you know, perhaps you've got a slow view. So one thing you can do is you can put on your nginx logs, you can log the upstream response time, so you've handed off to garner corn or to micro whiskey, and nginx will then log how long those responses took. And then you can say, Well, actually, that those responses to that particular request is taking a long time. And then on that request, you can go into that view. And you can add a little, you know, a little bit of instrumentation, and then you can start getting metrics for what that view is doing. And yeah, yeah, you know, +00:57:59 Yeah, the other thing that I found is promethium takes a teeny bit of setting up. But again, you can get yourself a Raspberry Pi. And you can play with that locally. And it's not that hard, you could put it on your Mac, or you put it on your Windows machine, it will run but get a Raspberry Pi set up monitor the device, Prometheus is good fun. There's a dashboard that goes with it called refiner. And then they do a log thing called locky, which you know, you have to configure it yourself, but it teaches you about your application, you have to say, Okay, I want to monitor this. And then you can put a little couple of tags in, and then Okay, you get metrics down for that bit. So you know, perhaps you've got a slow view. So one thing you can do is you can put on your nginx logs, you can log the upstream response time, so you've handed off to garner corn or to micro whiskey, and nginx will then log how long those responses took. And then you can say, Well, actually, that those responses to that particular request is taking a long time. And then on that request, you can go into that view. And you can add a little, you know, a little bit of instrumentation, and then you can start getting metrics for what that view is doing. And yeah, yeah, you know, -00:58:59 he does ever all of you guys. Yeah, I'm also a fan of data dog. I think those are pretty similar. +00:58:59 he does ever all of you guys. Yeah, I'm also a fan of DataDog. I think those are pretty similar. 00:59:03 Okay. @@ -328,7 +328,7 @@ 01:01:36 Yeah, absolutely. That's what you want as well. You want to be able to write your blog post in locally, and then it appear on your server. You don't want to have to go Oh, no, now I have to paste it into the admin on my remote. Yeah, it depends -01:01:47 on a lot of ways you want to go I mean, for for simple, like somebody learn Django site, which is basically one admin, largely a content site. I mean, in terms of easy peasy, it's just Django and I've got CloudFlare in front for it's done. It's super fast. I get, you know, decent amount of traffic. I don't have to think about it. I've got I think I've got Pedro duty one of those, you know, and +01:01:47 on a lot of ways you want to go I mean, for for simple, like somebody learn Django site, which is basically one admin, largely a content site. I mean, in terms of easy peasy, it's just Django and I've got CloudFlare in front for it's done. It's super fast. I get, you know, decent amount of traffic. I don't have to think about it. I've got I think I've got Pager duty one of those, you know, and 01:02:08 yeah, okay. Yeah, that's the one that tells you if it's, it's completely unreachable, right. @@ -346,13 +346,13 @@ 01:02:42 Alright, guys. No all the time all the time. Let's just wrap it up really quick. There's a Django newsletter. I learned about that. That's interesting. You guys want to just mention that that exists? -01:02:50 Yeah. Yeah. So I there's something I run with Jeff Triplett, who's partner at rev says, he's a Python board member, very involved with Django. So Django dash news.com. There wasn't a regular chicken noodle just wasn't a podcast. So he and I have been doing that for a little over a year and has projects articles. I sort of wish someone else did it. So I didn't have to do it. But yeah, we got a lot of people listening using it. So that's a good resource for the community. Yeah. Fantastic. +01:02:50 Yeah. Yeah. So I there's something I run with Jeff Triplett, who's partner at rev says, he's a Python board member, very involved with Django. So Django-news.com. There wasn't a regular chicken noodle just wasn't a podcast. So he and I have been doing that for a little over a year and has projects articles. I sort of wish someone else did it. So I didn't have to do it. But yeah, we got a lot of people listening using it. So that's a good resource for the community. Yeah. Fantastic. -01:03:13 All right. So before I let you out of here, final two questions really quick, we'll go the first notable pie package out there that you are like, Oh, my gosh, I ran across this the other day? I can't believe it. Ah, +01:03:13 All right. So before I let you out of here, final two questions really quick, we'll go the first notable PyPI package out there that you are like, Oh, my gosh, I ran across this the other day? I can't believe it. Ah, -01:03:24 that's a good question. I'll go back to an old school on which I think is bleach. Because I've been thinking about a course on forms. And bleach is is a paint, you know, Python one, not gender specific. But you pretty much always want to have that added to this validates like user input to make sure there's no like cross site scripting type stuff. Yeah. So that's okay. Top. I'm not a new one. But I think, you know, along with white noise, you just got to use it. Yeah. +01:03:24 that's a good question. I'll go back to an old school on which I think is Bleach. Because I've been thinking about a course on forms. And bleach is is a Python one, not gender specific. But you pretty much always want to have that added to this validates like user input to make sure there's no like cross site scripting type stuff. Yeah. So that's okay. Top. I'm not a new one. But I think, you know, along with white noise, you just got to use it. Yeah. -01:03:48 Awesome, Carlton. Yeah, no, well, the one that's just really captured my imagination recently is rich, which is the heritage score, the library for creating basically nice terminal output console output, but it does everything even has like tables and also stuff. Yeah, it's really neat. But like, it's got this inspect functionality where you can get in here, you're sitting there and you're in the shell, and you're like, I want to see it and you print it, and you get it and you you look at the dict and it's all it's not and then you go inspect poo and rich formats, this thing where it's like, oh, yeah, I can look here, and I can see exactly what's going on. And it's just a map, which is 20,000. +01:03:48 Awesome, Carlton. Yeah, no, well, the one that's just really captured my imagination recently is Rich, which is the heritage score, the library for creating basically nice terminal output console output, but it does everything even has like tables and also stuff. Yeah, it's really neat. But like, it's got this inspect functionality where you can get in here, you're sitting there and you're in the shell, and you're like, I want to see it and you print it, and you get it and you you look at the dict and it's all it's not and then you go inspect poor and rich formats, this thing where it's like, oh, yeah, I can look here, and I can see exactly what's going on. And it's just a map, which is 20,000. 01:04:25 That's what I love about coding, right? I'd never heard of this before. And it's clearly a very established thing. @@ -360,7 +360,7 @@ 01:04:33 it's taken off quite Yeah, it's about a year old. It's taken off quite steeply in its adoption, one that I ran across really recently. Just throw it out there because it's a Django semi related one is disk cache. Have you guys heard of that? No. So it's a really interesting a caching plugin that will instead of using memory for caching, it will store it on to local disk because usually have way more harddrive space than you have memory in the cloud. And it plugs into Django to like a standard for the cache there. -01:04:59 Okay, point I mean, I'm a big fan of file cache. Yeah, in general, because like memory is expensive, right? So memory caches is the best way. And Redis and memcached are good ways of doing that all in memory. But then you think I just want to generate some HTML once and then just pipe it off the hard disk. Exactly. To me, this is a kind of the equivalent of sequel lite versus a real database. It's like you don't need Redis or memcached, or something like completely set up just like until it gets beyond whatever. +01:04:59 Okay, point I mean, I'm a big fan of file cache. Yeah, in general, because like memory is expensive, right? So memory caches is the best way. And Redis and memcached are good ways of doing that all in memory. But then you think I just want to generate some HTML once and then just pipe it off the hard disk. Exactly. To me, this is a kind of the equivalent of SQL lite versus a real database. It's like you don't need Redis or memcached, or something like completely set up just like until it gets beyond whatever. 01:05:25 It looks pretty interesting. @@ -368,7 +368,7 @@ 01:05:39 Yeah, absolutely. Thanks. Yeah. That's very cool. All right. And then final question. If you're gonna write some Python code, what editor Do you use -01:05:45 vs. code? Same for both of us? Yeah. +01:05:45 VS code? Same for both of us? Yeah. 01:05:49 Well, I have to, yes, VS code. Very nice. I use it a lot. Also, though, I still like BB edit. @@ -376,9 +376,9 @@ 01:05:59 I tell you what, right. So I'm using VS code. And I'm like, but I need to do some transformations. Or I need to, you know, do a multi file search from the diff here and vs codes, all that stuff. But it's a bit you know, this is built in JavaScript was BB it's a native Mac App. It's got a logo. -01:06:13 Yeah, absolutely. I use PI charm. Yeah, I'm a big fan. local company here in Portland panic came out with so called Nova. I think it's Nova. It's pretty interesting as well. Yeah. It looks just beautiful. transmit. I love transmit. Yeah, I use it all the time. All the time. For s3 stuff. We're talking about fantastical. AI. Gentlemen, thank you so much for being on the show. Thank you for having us. Yeah, it's been great. And thanks for all the work on Django and I'll catch up with y'all soon. +01:06:13 Yeah, absolutely. I use PyCharm. Yeah, I'm a big fan. local company here in Portland panic came out with so called Nova. I think it's Nova. It's pretty interesting as well. Yeah. It looks just beautiful. transmit. I love transmit. Yeah, I use it all the time. All the time. For s3 stuff. We're talking about fantastical. AI. Gentlemen, thank you so much for being on the show. Thank you for having us. Yeah, it's been great. And thanks for all the work on Django and I'll catch up with y'all soon. 01:06:42 All right. Thanks for having us. -01:06:44 This has been another episode of talk Python to me. Our guests on this episode we're will Vincent and Carlton Gibson. And it's even brought to you by square and linode. With square your web app can easily take payments seamlessly accept debit and credit cards as well as digital wallet payments. Get started building your own online payment form in three steps with squares Python SDK at talk python.fm slash square. Simplify your infrastructure and cut your cost bills in half with linode. Linux virtual machines develop, deploy and scale your modern applications faster and easier. Visit talk python.fm slash linode and click the Create free account button to get started. level up your Python we have one of the largest catalogs of Python video courses over at talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in sight. Check it out for yourself at training dot talk python.fm Be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top. You can also find the iTunes feed at slash iTunes, the Google Play feed at slash play and the direct RSS feed at slash RSS on talk python.fm. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talk python.fm slash YouTube. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code +01:06:44 This has been another episode of talk Python to me. Our guests on this episode we're will Vincent and Carlton Gibson. And it's even brought to you by Square and Linode. With square your web app can easily take payments seamlessly accept debit and credit cards as well as digital wallet payments. Get started building your own online payment form in three steps with Squares Python SDK at 'talkpython.fm/square'. Simplify your infrastructure and cut your cost bills in half with Linode. Linux virtual machines develop, deploy and scale your modern applications faster and easier. Visit 'talkpython.fm/linode' and click the Create free account button to get started. level up your Python we have one of the largest catalogs of Python video courses over at talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in sight. Check it out for yourself at training.talk python.fm Be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top. You can also find the iTunes feed at /iTunes, the Google Play feed at /play and the direct RSS feed at /RSS on talkpython.fm. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talk python.fm/YouTube. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code diff --git a/transcripts/302-data-engineering.txt b/transcripts/302-data-engineering.txt index 771be781..20fb873f 100644 --- a/transcripts/302-data-engineering.txt +++ b/transcripts/302-data-engineering.txt @@ -1,14 +1,14 @@ 00:00:00 I'm sure you're familiar with data science. But what about data engineering? Are these the same thing? Or how are they related? data engineering is dedicated to overcoming data processing bottlenecks, data cleanup, data flow and data handling problems for applications that utilize a lot of data. On this episode, we welcome back Tobias Macy, give us a 30,000 foot view of the data engineering landscape in 2021. This is talk by me Episode 302, recorded January 29 2021. -00:00:41 Welcome to talk Python, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm at m Kennedy, and keep up with the show and listen to past episodes at talk python.fm and follow the show on Twitter via at talk Python. This episode is brought to you by data dog and retool, please check out what they're offering during their segments. It really helps support the show. Device ready to kick it off. +00:00:41 Welcome to talk Python, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy, and keep up with the show and listen to past episodes at 'talkpython.fm' and follow the show on Twitter via @talkpython. This episode is brought to you by 'Data Dog' and 'Retool', please check out what they're offering during their segments. It really helps support the show. Tobias ready to kick it off. 00:01:07 Yeah, sounds good. Thanks for having me on, Mike. Yeah. Great to have you here. Good to have you back. I was recently looking at my podcast page here. And it says you were on the show. 68 which a lot of fun. That was, uh, when Chris Patti was with you as well, around podcasts in it. And but boy, that was 2016. -00:01:25 It has been a while we've been at this a while. I mean, ironically, we started within a week of each other. But yeah, it's we're still going both of us. It's definitely been fun journey and a lot of a lot of great sort of unexpected benefits and great people that I've been able to meet as a result of it. So definitely glad to be able to be on the journey with you. Yeah, same here. podcasting, open doors, like nothing else. It's crazy people who wouldn't normally want to talk to you like, Hey, you want to be on the show? Yeah, let's spend an hour together all of a sudden, right? It's, it's fantastic. What's new since 2016? What do you been up to? Definitely a number of things. I mean, one being that I actually ended up going solo as the host. So I've been running the podcast and it show by myself. I don't remember exactly when it happened. But I think probably sometime around 2017. I know around the same time that I was on your show you were on mine. So we kind of flip flopped, and then you've been on the show again, since then talking about your experience working with MongoDB and Python. Yeah, you know, beyond that, I also ended up starting a second podcast. So I've got podcast in it, which focuses on Python and its community. So a lot of stuff about DevOps, data science, machine learning, web development, you name it, anything that people are doing with Python I've had them on. But I've also started a second show focused on data engineering. So going beyond just the constraints of Python into this separate niche, so more languages, but more tightly focused problem domain. And so I've been enjoying learning a lot more about the area of data engineering. And so it's actually been a good companion to the to where there's a lot of data science that happens in Python, so I'm able to cover that side of things on podcast.in it and then data engineering is all of the prep work that makes data scientists lives easier. And so just learning a lot about the technologies and challenges that happen on that side of things. +00:01:25 It has been a while we've been at this a while. I mean, ironically, we started within a week of each other. But yeah, it's we're still going both of us. It's definitely been fun journey and a lot of a lot of great sort of unexpected benefits and great people that I've been able to meet as a result of it. So definitely glad to be able to be on the journey with you. Yeah, same here. podcasting, open doors, like nothing else. It's crazy people who wouldn't normally want to talk to you like, Hey, you want to be on the show? Yeah, let's spend an hour together all of a sudden, right? It's, it's fantastic. What's new since 2016? What do you been up to? Definitely a number of things. I mean, one being that I actually ended up going solo as the host. So I've been running the podcast and it show by myself. I don't remember exactly when it happened. But I think probably sometime around 2017. I know around the same time that I was on your show you were on mine. So we kind of flip flopped, and then you've been on the show again, since then talking about your experience working with MongoDB and Python. Yeah, you know, beyond that, I also ended up starting a second podcast. So I've got podcast in it, which focuses on Python and its community. So a lot of stuff about DevOps, data science, machine learning, web development, you name it, anything that people are doing with Python I've had them on. But I've also started a second show focused on data engineering. So going beyond just the constraints of Python into this separate niche, so more languages, but more tightly focused problem domain. And so I've been enjoying learning a lot more about the area of data engineering. And so it's actually been a good companion to the to where there's a lot of data science that happens in Python, so I'm able to cover that side of things on 'podcast.in' it and then data engineering is all of the prep work that makes data scientists lives easier. And so just learning a lot about the technologies and challenges that happen on that side of things. 00:03:09 Yeah, that's super cool. And to be honest, one of the reasons I invite you on the show is because I know people talk about data engineering, and I'm I know there's neat tools. And they're they feel like they come out of the data science space, but not exactly. And so I'm really looking forward to learning about them, along with everyone else listening, so it's gonna be a lot of fun. Absolutely. Yeah. Before we dive into that let people maybe know, what are you doing day to day these days? Are you doing consulting? Or you got a full time job? Oh, what's the plan? -00:03:34 Yes, yes. So yeah, I mean, I run the podcast as a side, just sort of hobby. And from my day to day, I actually work full time at MIT in the open learning department and help run the platform engineering and data engineering team. They're so responsible for making sure that all that all the cloud environments are set up and secured and servers are up and running and applications stay available. And working through building out a data platform to provide a lot of means for analytics and gaining insights into the learning habits and the behaviors that global learners have and how they interact with all of the different platforms that we run. That's fantastic. Yeah, it's definitely a great place to work. And happy to be there for a number of years now. And then, you know, I run the podcasts. So those go out every week. So a lot of stuff that happens behind the scenes there. And then I also do some consulting, where lately it's been more of the advisory type where it used to be I'd be hands on keyboard, but I've been able to level up beyond that. And so I've been working with a couple of venture capital firms to help them understand the data ecosystem. So data engineering, data science, have also worked a little bit with a couple of businesses just helping them understand sort of what are the challenges and what's the potential in the data marketplace and data ecosystem to be able to go beyond just having an application and then being able to use the information and the data that they gather from that to be able to build More interesting insights into their business, but also products for their customers. Oh, yeah, +00:03:34 Yes, yes. So yeah, I mean, I run the podcast as a side, just sort of hobby. And from my day to day, I actually work full time at MIT in the open learning department and help run the platform engineering and data engineering team. They're so responsible for making sure that all that all the cloud environments are set up and secured and servers are up and running and applications stay available. And working through building out a data platform to provide a lot of means for analytics and gaining insights into the learning habits and the behaviors that global learners have and how they interact with all of the different platforms that we run. That's fantastic. Yeah, it's definitely a great place to work. And happy to be there for a number of years now. And then, you know, I run the podcasts. So those go out every week. So a lot of stuff that happens behind the scenes there. And then I also do some consulting, where lately it's been more of the advisory type where it used to be I'd be hands on keyboard, but I've been able to level up beyond that. And so I've been working with a couple of venture capital firms to help them understand the data ecosystem. So data engineering, data science, have also worked a little bit with a couple of businesses just helping them understand sort of what are the challenges and what's the potential in the data marketplace and data ecosystem to be able to go beyond just having an application and then being able to use the information and the data that they gather from that to be able to build more interesting insights into their business, but also products for their customers. Oh, yeah, 00:05:05 that sounds really fun. I mean, work, MIT sounds amazing. And then those advisory roles are really neat, because you kind of get a take, especially as a podcaster, you get this broad view, because he talked to so many people. And you know, they've got different situations in different contexts. And so you can say, all right, look, here's kind of what I see, you seem to fit into this relevant. So this might be the right path. @@ -16,11 +16,11 @@ 00:06:15 which is the perfect match for the high level view. Right? Exactly. Nice. All right, well, let's jump into our main topic. And we touched on it a little bit, but I know what data science is, I think, and there's a really interesting interview I did with Emily and Jacqueline, I don't remember both their last names recently about about building a career in data science. And they talked about basically three areas of data science that you might be in, like production and machine learning, versus making predictions and so on. And data engineering, it feels like it's kind of in that data science realm. But it's not exactly that. Like it could kind of be databases and other stuff, too, right? Like, what is this data engineering thing? Maybe compare contrast against data sciences, people probably know that pretty well. -00:06:57 Yeah. So it's one of those kind of all encompassing terms that, you know, the role depends on the organization that you're in. So in some places, data engineer might just be the person who used to be the DBA, or the database administrator. In other places, they might be responsible for cloud infrastructure. And another place, they might be responsible for maintaining streaming systems. One way that I've seen it broken down as kind of two sort of broad classifications of data engineering is, there's the sequel focused data engineer, where they might have a background as a database administrator. And so they do a lot of work in managing the data warehouse, they work with SQL oriented tools, where there are a lot of them coming out now where you can actually use SQL for being able to pull data from source systems into the data warehouse, and then provide, you know, build transformations to provide to analysts and data scientists. And then there is the more engineering oriented data engineer, which is somebody who writes a lot of software, they're building complex infrastructure. And architecture is using things like Kafka or Flink or Spark, they're working with the database, they're working with data orchestration tools, like airflow or Daxter or prefect, they might be using bask and so they're much more focused on actually writing software and delivering code as the output of their efforts. Right, okay. But the shared context across what however you define data engineering, the shared aspect of it is that they're all working to bring data from multiple locations into a place that is accessible for various end users, where the end users might be analysts or data scientists or the business intelligence tools. And they're tasked with making sure that those workflows are repeatable and maintainable and that the data is clean and organized. So that it's useful because you know, everybody knows the whole garbage in garbage out principle. Yeah, if you're a data scientist, and you don't have all the context of where the data is coming from, you just have a small, narrow scope of what you need to work with. You're kind of struggling with that garbage in garbage out principle. And so the data engineers job is to get rid of all the garbage and give you something clean that you can work from, I think that's really a tricky problem in the data science side of things. You take your data, you run it through a model or through some analysis graphene layer, and it gives you a picture like, well, that's the answer. Maybe, maybe it is right. Did you give it the right input? And did you train the models in the right data? Who knows? Right, right. That's, you know, definitely a big challenge. And that's one of the reasons why data engineering has become so multifaceted is because what you're doing with the data informs the ways that you prepare the data, you know, you need to make sure that you have a lot of the contextual information as well to make sure that the data scientists and data analysts are able to answer the questions accurately because data in isolation, if you just give somebody the number five, it's completely meaningless. But if you tell them that a customer ordered five of this unit, well then now you can actually do something with it. So the the Context helps to provide the information about that isolated number and understanding where it came from and why it's important. +00:06:57 Yeah. So it's one of those kind of all encompassing terms that, you know, the role depends on the organization that you're in. So in some places, data engineer might just be the person who used to be the DBA, or the database administrator. In other places, they might be responsible for cloud infrastructure. And another place, they might be responsible for maintaining streaming systems. One way that I've seen it broken down as kind of two sort of broad classifications of data engineering is, there's the SQL focused data engineer, where they might have a background as a database administrator. And so they do a lot of work in managing the data warehouse, they work with SQL oriented tools, where there are a lot of them coming out now where you can actually use SQL for being able to pull data from source systems into the data warehouse, and then provide, you know, build transformations to provide to analysts and data scientists. And then there is the more engineering oriented data engineer, which is somebody who writes a lot of software, they're building complex infrastructure. And architecture is using things like Kafka or Flink or Spark, they're working with the database, they're working with data orchestration tools, like 'Airflow' or 'Daxter' or 'Prefect', they might be using 'Dask' and so they're much more focused on actually writing software and delivering code as the output of their efforts. Right, okay. But the shared context across what however you define data engineering, the shared aspect of it is that they're all working to bring data from multiple locations into a place that is accessible for various end users, where the end users might be analysts or data scientists or the business intelligence tools. And they're tasked with making sure that those workflows are repeatable and maintainable and that the data is clean and organized. So that it's useful because you know, everybody knows the whole garbage in garbage out principle. Yeah, if you're a data scientist, and you don't have all the context of where the data is coming from, you just have a small, narrow scope of what you need to work with. You're kind of struggling with that garbage in garbage out principle. And so the data engineers job is to get rid of all the garbage and give you something clean that you can work from, I think that's really a tricky problem in the data science side of things. You take your data, you run it through a model or through some analysis graphene layer, and it gives you a picture like, well, that's the answer. Maybe, maybe it is right. Did you give it the right input? And did you train the models in the right data? Who knows? Right, right. That's, you know, definitely a big challenge. And that's one of the reasons why data engineering has become so multifaceted is because what you're doing with the data informs the ways that you prepare the data, you know, you need to make sure that you have a lot of the contextual information as well to make sure that the data scientists and data analysts are able to answer the questions accurately because data in isolation, if you just give somebody the number five, it's completely meaningless. But if you tell them that a customer ordered five of this unit, well then now you can actually do something with it. So the the Context helps to provide the information about that isolated number and understanding where it came from and why it's important. 00:10:07 Yeah, absolutely. You know, two things come to mind when I hear data engineering for me is like, one is like pipelines of data, you know, maybe you've got to bring in data and do transformations to it to get it ready. This is part of that data cleanup, maybe, and taking disparate sources and, and unifying them under one canonical model or something in representation. And then ETL, I kind of like, we get something terrible, like FTP uploads of CSV files, and we've got to turn those into databases like overnight jobs, right? or things like that, which probably still exist. they existed not too long ago. -00:10:37 Yeah. Every sort of legacy technology that you think has gone away, because you're not working with it anymore, is still in existence somewhere, which is why we still have cobalt. Exactly. Oh, my gosh, I've got some crazy, crazy cobalt stories for you that probably shouldn't go out public. Ask me over the next conference The next time we get to travel somewhere, you know? Alright. Sounds good. For sure. So let's talk about trends. I made that joke, right? Like, well, maybe it used to be CSV files, or text files, and FTP, and then a job that would put that into a SQL database or some kind of relational database. What is it now, it's got to be better than that, right? I mean, again, depends where you are. I mean, CSV files are still a thing. You know, it may not be FTP anymore, it's probably going to be living in object storage, like s3, or Google Cloud Storage. But you know, you're still working with individual files. And some places, a lot of it is coming from API's or databases, where you might need to pull all of the information from Salesforce to get your CRM data. Or you might be pulling data out of Google Analytics by their API, you know, a lot, there are a lot of evolutionary trends that have happened sort of first big movement in data engineering, beyond just the sort of, well, there have been a few generations. So the first generation was the data warehouse, where you took a database appliance, whether that was Oracle, or Microsoft SQL Server or Postgres, you put all of your data into it. And then you had to do a lot of work to model it so that you could answer questions about that data. So in an application database, you're liable to just overwrite a record when something changes when it's in a data warehouse, you want that historical information about what changed and the evolution of that data? What about like normalization, in operational databases, it's all about one source of truth, we better not have any duplication, it's fine if there's four joins to get there. Whereas in warehousing, it's maybe better to have that duplication. So you can run different types of reports real quickly and easily. Exactly. Yeah. I mean, you still need to have one source of truth, but you will model the tables differently than an up in an application database. So there are things like the star schema, or the snowflake schema became popular in the initial phase of data warehousing. So Ralph Kimball is famous for building out the sort of star schema approach with facts and dimensions. Yeah, maybe describe that a little bit for people, because maybe they don't know these terms. Sure. So facts are things like, you know, a fact is Tobias Macy works at MIT. And that a dimension might be he was hired in 2016, or whatever year it was. And another dimension of it is he you know, his work anniversary is x date. And so the way that you model it makes it so a fact is something that's immutable. And then a dimension are things that might evolve over time. And then in sort of the next iteration of data engineering and data management was the sort of, quote unquote, big data craze where Google released their paper about MapReduce. And so Hadoop came out as a open source option for that. And so everybody said, oh, I've got to get Yeah, MapReduce was gonna take over the world, right? Like, that was the only way you could do anything. Big Data, then you had to MapReduce it. And then maybe it had to do with one of these large scale databases, right, spark or Cassandra? Or who knows something like that? Yeah. I mean, SPARC and Cassandra came after Hadoop. So I mean, Hadoop was your option in the, you know, early 2000s. And so everybody said, Oh, big data is the answer. If I just throw big data at everything, it'll solve all my problems. And so people built these massive data lakes using Hadoop and built these MapReduce jobs, and then realized that what are we actually doing with all this data, it's costing us more money than it's worth, MapReduce jobs are difficult to scale, they're, you know, difficult to understand the order of dependencies. And so that's when things like spark came out to use the data that you're already collecting, but be able to parallelize the operations and run it a little faster. And so, you know, that was sort of the era of batch oriented workflows. And then with the advent of things like Spark streaming and Kafka, and you know, there are a whole number of other tools out there now, like Flink and pulsar, the sort of real time revolution is where we're at now, where it's not enough to be able to understand what happened The next day, you have to understand what's happening, you know, within five minutes, and so, there are principles like Change Data Capture, where every time I write a new record into a database, it goes into Kafka queue, which then gets replicated out to an Elasticsearch cluster, and to my data warehouse. And so within five minutes, my business intelligence dashboard is updated with the fact that customer a bought product B, rather than having to wait in 24 hours to get that insight, +00:10:37 Yeah. Every sort of legacy technology that you think has gone away, because you're not working with it anymore, is still in existence somewhere, which is why we still have 'Cobalt'. Exactly. Oh, my gosh, I've got some crazy, crazy cobalt stories for you that probably shouldn't go out public. Ask me over the next conference The next time we get to travel somewhere, you know? Alright. Sounds good. For sure. So let's talk about trends. I made that joke, right? Like, well, maybe it used to be CSV files, or text files, and FTP, and then a job that would put that into a SQL database or some kind of relational database. What is it now, it's got to be better than that, right? I mean, again, depends where you are. I mean, CSV files are still a thing. You know, it may not be FTP anymore, it's probably going to be living in object storage, like s3, or Google Cloud Storage. But you know, you're still working with individual files. And some places, a lot of it is coming from API's or databases, where you might need to pull all of the information from Salesforce to get your CRM data. Or you might be pulling data out of Google Analytics by their API, you know, a lot, there are a lot of evolutionary trends that have happened sort of first big movement in data engineering, beyond just the sort of, well, there have been a few generations. So the first generation was the data warehouse, where you took a database appliance, whether that was Oracle, or Microsoft SQL Server or Postgres, you put all of your data into it. And then you had to do a lot of work to model it so that you could answer questions about that data. So in an application database, you're liable to just overwrite a record when something changes when it's in a data warehouse, you want that historical information about what changed and the evolution of that data? What about like normalization, in operational databases, it's all about one source of truth, we better not have any duplication, it's fine if there's four joins to get there. Whereas in warehousing, it's maybe better to have that duplication. So you can run different types of reports real quickly and easily. Exactly. Yeah. I mean, you still need to have one source of truth, but you will model the tables differently than an up in an application database. So there are things like the 'Star schema', or the 'Snowflake schema' became popular in the initial phase of data warehousing. So Ralph Kimball is famous for building out the sort of Star schema approach with facts and dimensions. Yeah, maybe describe that a little bit for people, because maybe they don't know these terms. Sure. So facts are things like, you know, a fact is Tobias Macy works at MIT. And that a dimension might be he was hired in 2016, or whatever year it was. And another dimension of it is he you know, his work anniversary is x date. And so the way that you model it makes it so a fact is something that's immutable. And then a dimension are things that might evolve over time. And then in sort of the next iteration of data engineering and data management was the sort of, quote unquote, big data craze where Google released their paper about MapReduce. And so Hadoop came out as a open source option for that. And so everybody said, oh, I've got to get Yeah, MapReduce was gonna take over the world, right? Like, that was the only way you could do anything. Big Data, then you had to MapReduce it. And then maybe it had to do with one of these large scale databases, right, spark or Cassandra? Or who knows something like that? Yeah. I mean, SPARK and 'Cassandra' came after Hadoop. So I mean, Hadoop was your option in the, you know, early 2000s. And so everybody said, Oh, big data is the answer. If I just throw big data at everything, it'll solve all my problems. And so people built these massive data lakes using Hadoop and built these MapReduce jobs, and then realized that what are we actually doing with all this data, it's costing us more money than it's worth, MapReduce jobs are difficult to scale, they're, you know, difficult to understand the order of dependencies. And so that's when things like spark came out to use the data that you're already collecting, but be able to parallelize the operations and run it a little faster. And so, you know, that was sort of the era of batch oriented workflows. And then with the advent of things like Spark streaming and Kafka, and you know, there are a whole number of other tools out there now, like Flink and pulsar, the sort of real time revolution is where we're at now, where it's not enough to be able to understand what happened The next day, you have to understand what's happening, you know, within five minutes, and so, there are principles like Change Data Capture, where every time I write a new record into a database, it goes into Kafka queue, which then gets replicated out to an Elasticsearch cluster, and to my data warehouse. And so within five minutes, my business intelligence dashboard is updated with the fact that customer a bought product B, rather than having to wait in 24 hours to get that insight, 00:15:15 I think that makes tons of sense. So instead of going, like, we're just gonna pile the data into this, you know, some sort of data lake type thing, then we'll grab it, and we'll do our reports, nightly, or hourly or whatever, you just keep pushing it down the road as it comes in or as it's generated. Right, right. @@ -28,7 +28,7 @@ 00:16:10 Yeah, there's a whole platforms that are just around to just do data streaming for you, right, there's like, sort of manage that and keep that alive. And with the popularization of web hooks, right? It's easy to say if something changes here, you know, notify this other thing, and that thing can call other things. And it seems like it's coming along. Yeah, -00:16:28 yeah, one of the interesting aspects to have a lot of the work that's been going into the data engineering space is that you're starting to see some of the architectural patterns and technologies move back into the application development domain where a lot of applications, particularly if you're working with micro services, will use something like a Kafka or a pulsar queue as the communication layer for being able to propagate information across all the different decoupled applications. And that's the same technology and same architectural approaches that are being used for these real time data pipelines. Yeah. And aren't queues amazing for adding scale systems, right? It's gonna take too long throw in a queue and let thing crank on over 30 seconds. It'll be good. Absolutely. I mean, celery is, you know that the same idea is just a smaller scale. And so, you know, rabbit mq, it's more ephemeral. Whereas when you're putting it into these durable queues, you can do more with the information where you can rewind time to be able to say, Okay, I changed my logic, I now want to reprice, reprocess all of these records from the past three months. Whereas if you had that on rabbit mq, all those records are gone unless you wrote them out somewhere else. This portion of talk Python, to me is brought to you by data dog. Are you having trouble visualizing latency and CPU or memory bottlenecks in your app, not sure where the issue is coming from or how to solve it. Data dog seamlessly correlates logs and traces at the level of individual requests, allowing you to quickly troubleshoot your Python application. Plus, their continuous profiler allows you to find the most resource consuming parts of your production code all the time at any scale with minimal overhead. be the hero that got that app back on track at your company. Get started today with a free trial at talk python.fm slash data dog, or just click the link in your podcast player shownotes. Get the insight you've been missing with data dog a +00:16:28 yeah, one of the interesting aspects to have a lot of the work that's been going into the data engineering space is that you're starting to see some of the architectural patterns and technologies move back into the application development domain where a lot of applications, particularly if you're working with micro services, will use something like a Kafka or a pulsar queue as the communication layer for being able to propagate information across all the different decoupled applications. And that's the same technology and same architectural approaches that are being used for these real time data pipelines. Yeah. And aren't queues amazing for adding scale systems, right? It's gonna take too long throw in a queue and let thing crank on over 30 seconds. It'll be good. Absolutely. I mean, Celery is, you know that the same idea is just a smaller scale. And so, you know, Rabbit MQ, it's more ephemeral. Whereas when you're putting it into these durable queues, you can do more with the information where you can rewind time to be able to say, Okay, I changed my logic, I now want to reprice, reprocess all of these records from the past three months. Whereas if you had that on 'RabbitMQ', all those records are gone unless you wrote them out somewhere else. This portion of talk Python, to me is brought to you by Data dog. Are you having trouble visualizing latency and CPU or memory bottlenecks in your app, not sure where the issue is coming from or how to solve it. Data dog seamlessly correlates logs and traces at the level of individual requests, allowing you to quickly troubleshoot your Python application. Plus, their continuous profiler allows you to find the most resource consuming parts of your production code all the time at any scale with minimal overhead. be the hero that got that app back on track at your company. Get started today with a free trial at 'talkpython.fm/datadog', or just click the link in your podcast player shownotes. Get the insight you've been missing with Data dog a 00:18:16 couple comments from the live stream. Defra says airflow, Apache airflow is really cool, for sure we're going to talk about that. But I did want to ask you about the cloud. Stefan says I'm skeptic, a little bit skeptical about the privacy and security on the cloud. So kind of want to use the known server more often. So maybe that's a trend that you could speak to that you've seen with folks you've interviewed, this kind of data is really sensitive sometimes. And people are very protective of it or whatever. Right? @@ -40,9 +40,9 @@ 00:21:36 Yeah, what are the next things I wanted to ask you about is languages. So you probably familiar with this, this chart here? Right? Which if not, people are not watching the stream. This is the StackOverflow trend show in Python, just trouncing the other languages, including Java. But I know Java had been maybe one of the main ways that probably has to do with spark and whatnot. And some degree, what do you see pythons role relative to other technologies here. -00:22:04 So Python has definitely been growing a lot in the data engineering space, largely because of the fact that it's so popular in data science. And so there are data scientists who have been moving further down the stack into data engineering as a requirement of their job. And so they are bringing Python into those layers of the stack. It's also being used as just a unifying language so that data engineers and data scientists can work on the same code bases. As you mentioned, Java has been popular for a long time in the data ecosystem, because of things like Hadoop and Spark. And looking at the trend graph, I'd be interested to see what what it looks like if you actually combine the popularity of Java and Scala because Scala right become the strong contender in that space as well, because of things like spark and Flink that have native support for Scala, it's a bit more of an esoteric language, but it's used a lot in data processing. But Python has definitely gained a lot of ground. And also because of tools like airflow, which was kind of the first generation tool built for data engineers, by data engineers to be able to manage these dependency graphs of operations so that you can have these pipelines to say, you know, I need to pull data out of Salesforce and then landed into s3. And then I need to have another job that takes that data out of s3 and puts it into the database. And then also that same s3 data needs to go into an analytics job, then once those two jobs are complete, I need to kick off another job that then runs a SQL query against the data warehouse to be able to provide some aggregate information to my sales and marketing team to say, this is what you know, your customer engagement is looking like, or whatever it might be. Yeah, and that was all written in Python. And also, just because of the massive ecosystem of libraries that Python has for being able to interconnect across all these different systems. And data engineering, at a certain level is really just a systems integration task where you need to be able to have information flowing across all of these different layers and all these different systems and get good control over it. Some of the interesting tools that have come out as a sort of generational improvement over airflow are Dexter and prefect. I've actually been using Baxter for my own work at MIT and been enjoying that tool. I'm always happy to dig into that let's sort of focus on those things. And what are the themes I wanted to cover is maybe the five most important packages or libraries for data engineering, and you kind of hit the first one that will group together as a trifecta, right? So right airflow, Daxter and prefect. You want to maybe tell us about those three of them which one you prefer. So I personally use Daxter. I like a lot of the abstractions and the interface design that they provide, but they're all three grouped into a category of tools called sort of workflow management or data orchestration. And so the responsibility there is that you need to have a way to build these pipelines build these DAGs are directed acyclic graphs of operations where the vertices of the graph The data and the nodes are the jobs of the operations being performed on them. And so you need to be able to build up this dependency chain because you need to get information out of a source system, you need to get it into a target system, you might need to perform some transformations either on route or after it's been landed. You know, one of the common trends that's happening is it used to be extract, transform, and then load because you needed to have all of the information in that specialized schema for the data warehouse that we were mentioning earlier. Right, right. All the relational database database actually had to have these columns in this, it can't be long characters got to via var, var car 10 or whatever, right. +00:22:04 So Python has definitely been growing a lot in the data engineering space, largely because of the fact that it's so popular in data science. And so there are data scientists who have been moving further down the stack into data engineering as a requirement of their job. And so they are bringing Python into those layers of the stack. It's also being used as just a unifying language so that data engineers and data scientists can work on the same code bases. As you mentioned, Java has been popular for a long time in the data ecosystem, because of things like Hadoop and Spark. And looking at the trend graph, I'd be interested to see what what it looks like if you actually combine the popularity of Java and Scala because Scala right become the strong contender in that space as well, because of things like spark and Flink that have native support for Scala, it's a bit more of an esoteric language, but it's used a lot in data processing. But Python has definitely gained a lot of ground. And also because of tools like airflow, which was kind of the first generation tool built for data engineers, by data engineers to be able to manage these dependency graphs of operations so that you can have these pipelines to say, you know, I need to pull data out of Salesforce and then landed into s3. And then I need to have another job that takes that data out of s3 and puts it into the database. And then also that same s3 data needs to go into an analytics job, then once those two jobs are complete, I need to kick off another job that then runs a SQL query against the data warehouse to be able to provide some aggregate information to my sales and marketing team to say, this is what you know, your customer engagement is looking like, or whatever it might be. Yeah, and that was all written in Python. And also, just because of the massive ecosystem of libraries that Python has for being able to interconnect across all these different systems. And data engineering, at a certain level is really just a systems integration task where you need to be able to have information flowing across all of these different layers and all these different systems and get good control over it. Some of the interesting tools that have come out as a sort of generational improvement over airflow are Dexter and prefect. I've actually been using Daxter for my own work at MIT and been enjoying that tool. I'm always happy to dig into that let's sort of focus on those things. And what are the themes I wanted to cover is maybe the five most important packages or libraries for data engineering, and you kind of hit the first one that will group together as a trifecta, right? So right airflow, Daxter and prefect. You want to maybe tell us about those three of them which one you prefer. So I personally use Daxter. I like a lot of the abstractions and the interface design that they provide, but they're all three grouped into a category of tools called sort of workflow management or data orchestration. And so the responsibility there is that you need to have a way to build these pipelines build these DAGs are directed acyclic graphs of operations where the vertices of the graph The data and the nodes are the jobs of the operations being performed on them. And so you need to be able to build up this dependency chain because you need to get information out of a source system, you need to get it into a target system, you might need to perform some transformations either on route or after it's been landed. You know, one of the common trends that's happening is it used to be extract, transform, and then load because you needed to have all of the information in that specialized schema for the data warehouse that we were mentioning earlier. Right, right. All the relational database database actually had to have these columns in this, it can't be long characters got to via var, var car 10 or whatever, right. -00:25:38 And then, with the advent of the cloud data warehouses that have been happening in the past few years that was kicked off by redshift from Amazon, and then carried on by things like Google BigQuery snowflake that a lot of people will probably be aware of, you know, there are a number of other systems and platforms out there, presto, out of Facebook, that is now an open source project actually renamed to trino. Those systems are allowing people to be very SQL oriented, but because of the fact that they're scalable, and they provide more flexible data models, the trend has gone to extract, load, and then transform, because you can just replicate the schema as is into these destination systems. And then you can perform all of your transformations in SQL. And so that brings us into another tool that is in the Python ecosystem that's been gaining a lot of ground called DBT, or data build tool. And so this is a tool that actually brings data analysts and improves their skill set makes them more self sufficient within the organization, and provides a lot of threads a great framework for them to operate in an engineering mindset where it helps to build up a specialized dag within the context of the data warehouse to take those source data sets that are landed into the data warehouse from the extract and load jobs and build these transformations. So you might have the user table from your application database and the Orders table. And then you also have the Salesforce information that's landed in a separate table. And you want to be able to combine all of those to be able to understand your customer order customer buying patterns. And so you use sequel to build either a view or build a new table out of that source information in the data warehouse, and DBT will handle that workflow. It also has support for being able to build unit tests in SQL into your workflow. Oh, how interesting. Yeah, that's something that you hadn't really heard very much of 10 years ago, testing in databases is usually how do I get the database out of the picture? So I can test without depending upon it, or something like that. That was the story. Yeah, that's another real growing trend is the overall aspect of data quality and confidence in your data flows. So things like in Daxter, and prefect and airflow, they have support for being able to unit test your pipelines, which is another great aspect of the Python ecosystem, as you can just write pi test code to ensure that all the operations on your data match your expectations, and you don't have regressions and bugs. Right. +00:25:38 And then, with the advent of the cloud data warehouses that have been happening in the past few years that was kicked off by redshift from Amazon, and then carried on by things like Google BigQuery snowflake that a lot of people will probably be aware of, you know, there are a number of other systems and platforms out there, Presto, out of Facebook, that is now an open source project actually renamed to 'Trino'. Those systems are allowing people to be very SQL oriented, but because of the fact that they're scalable, and they provide more flexible data models, the trend has gone to extract, load, and then transform, because you can just replicate the schema as is into these destination systems. And then you can perform all of your transformations in SQL. And so that brings us into another tool that is in the Python ecosystem that's been gaining a lot of ground called DBT, or data build tool. And so this is a tool that actually brings data analysts and improves their skill set makes them more self sufficient within the organization, and provides a lot of threads a great framework for them to operate in an engineering mindset where it helps to build up a specialized dag within the context of the data warehouse to take those source data sets that are landed into the data warehouse from the extract and load jobs and build these transformations. So you might have the user table from your application database and the Orders table. And then you also have the Salesforce information that's landed in a separate table. And you want to be able to combine all of those to be able to understand your customer order customer buying patterns. And so you use SQL to build either a view or build a new table out of that source information in the data warehouse, and DBT will handle that workflow. It also has support for being able to build unit tests in SQL into your workflow. Oh, how interesting. Yeah, that's something that you hadn't really heard very much of 10 years ago, testing in databases is usually how do I get the database out of the picture? So I can test without depending upon it, or something like that. That was the story. Yeah, that's another real growing trend is the overall aspect of data quality and confidence in your data flows. So things like in Daxter, and prefect and airflow, they have support for being able to unit test your pipelines, which is another great aspect of the Python ecosystem, as you can just write 'Py test' code to ensure that all the operations on your data match your expectations, and you don't have regressions and bugs. Right. 00:28:05 Right. Absolutely. @@ -50,53 +50,53 @@ 00:30:10 Yep. It's actions have been taken. And -00:30:14 this portion of talk Python to me is brought to you by retool, do you really need a full dev team to build that simple internal app at your company? I'm talking about those Back Office apps. The tool your customer service team uses to access your database, that s3 uploader you built last year for the marketing team, the quick admin panel that lets you monitor key KPIs, or maybe even the tool your data science team hacked together so they could provide custom ad spend insights. Literally, every type of business relies on these internal tools. But not many engineers love building these tools, let alone get excited about maintaining or supporting them over time. They eventually fall into the please don't touch it. It's working category of apps. And here's where retool comes in. Companies like doordash brex. plaid and even Amazon use retool to build internal tools superfast ideas them almost all internal tools look the same forms over data. They're made up of tables, dropdowns, buttons, text input, and so on. free tool gives you a point click and drag and drop interface that makes it super simple to build internal UI like this in hours not days. retool can connect to any database or API want to pull data from Postgres. Just write a SQL query and drag the table onto your canvas. search across those fields. Add a search input bar and update your query. Save it share it super easy. We tool is built by engineers explicitly for engineers. It can be set up to run on prem in about 15 minutes using Docker or Kubernetes or Heroku. Get started with retools today. Just visit talk python.fm slash retool or click the retool link in your podcast player show notes. +00:30:14 this portion of talk Python to me is brought to you by 'Retool', do you really need a full dev team to build that simple internal app at your company? I'm talking about those Back Office apps. The tool your customer service team uses to access your database, that s3 uploader you built last year for the marketing team, the quick admin panel that lets you monitor key KPIs, or maybe even the tool your data science team hacked together so they could provide custom ad spend insights. Literally, every type of business relies on these internal tools. But not many engineers love building these tools, let alone get excited about maintaining or supporting them over time. They eventually fall into the please don't touch it. It's working category of apps. And here's where retool comes in. Companies like Doordash brex. Plaid and even Amazon use retool to build internal tools superfast ideas them almost all internal tools look the same forms over data. They're made up of tables, dropdowns, buttons, text input, and so on. free tool gives you a point click and drag and drop interface that makes it super simple to build internal UI like this in hours not days. retool can connect to any database or API want to pull data from Postgres. Just write a SQL query and drag the table onto your canvas. search across those fields. Add a search input bar and update your query. Save it share it super easy. Retool is built by engineers explicitly for engineers. It can be set up to run on prem in about 15 minutes using Docker or Kubernetes or Heroku. Get started with retools today. Just visit 'talkpython.fm/retool' or click the retool link in your podcast player show notes. 00:31:51 They'll be jumping back really quick to that language trends question real quick. So Anthony Lister asks if is are still widely used as sort of a strong competitor, let's say to Python, and what's your thoughts these days, I can honestly hear a little bit less of it in my world for some reason. Yeah. So there are definitely a lot of languages are is definitely one of them that's still popular in the data space, I don't really see are in the data engineering context, it's definitely still used for a lot of statistical modeling, machine learning data science workloads. There's a lot of great interoperability between R and Python. Now, especially with the arrow project, which is a in memory columnar representation that provides an interoperable, it provides an in memory space where you can actually exchange data between R and Python and Java without having to do any IO copying between them. So it helps to reduce a lot of the impedance mismatch between between those languages. Another language that's been gaining a lot of ground in the data ecosystem is Julia. And they're actually under the num focus organization that supports a lot of the Python data ecosystem. Yeah, so Julia has been gaining a lot of ground, but Python, just because of its broad use is still very popular. And there's an anecdote that I've heard a number of times, I don't remember where I first came across it that Python isn't the best language for anything, but it's the second best language for everything. -00:33:12 Yeah, that's a good quote, I think it does put a lot of perspective on it. I feel like it's just so approachable, right? Exactly. And there's a lot of these languages that might make slightly more sense for certain use case like AR and statistics. But you better not want to have to, you know, build some other thing that reaches outside of what's easily possible, right? Like, right, you want to make that an API now? Well, all of a sudden, it's not so easy or whatever, right? Something along those lines. Exactly. Alright, next in our list here is dask. +00:33:12 Yeah, that's a good quote, I think it does put a lot of perspective on it. I feel like it's just so approachable, right? Exactly. And there's a lot of these languages that might make slightly more sense for certain use case like R and statistics. But you better not want to have to, you know, build some other thing that reaches outside of what's easily possible, right? Like, right, you want to make that an API now? Well, all of a sudden, it's not so easy or whatever, right? Something along those lines. Exactly. Alright, next in our list here is dask. -00:33:41 Yeah, so dask is a great tool, I kind of think about it as the Python version of Spark. There are a number of reasons that's not exactly accurate. But it's a tool that lets you parallelize your Python operations, scale it out into clusters. It also has a library called task dot distributed that's used a lot for just scaling out Python independent of actually building the directed acyclic graphs in desc. So one of the main ways that spark is used is as an ETL engine. So you can build these graphs of tasks in Spark, you can do the same thing with task, it was actually built originally more for the hard sciences and for scientific workloads. And not just for data science. Yeah, but dask is actually also used as a foundational layer for a number of the data orchestration tools out there. So dask is the foundational layer for prefect, you can use it as an execution substrate for the Daxter library, the Dexter framework and also supports, it's also supported in airflow as a execution layer. Then there are also a number of people who are using it as a replacement for things like celery is just a means of running asynchronous tasks outside of the bounds of a request response cycle. So it's just growing a lot in the data ecosystem, both for data engineering and data science. And so just provides that unified layer of being able to build your data engineering workflows, and then hand that directly off into machine learning so that you don't have to jump between different systems. You can do it all in one layer. +00:33:41 Yeah, so Dask is a great tool, I kind of think about it as the Python version of Spark. There are a number of reasons that's not exactly accurate. But it's a tool that lets you parallelize your Python operations, scale it out into clusters. It also has a library called task 'dask.distributed' that's used a lot for just scaling out Python independent of actually building the directed acyclic graphs in DASK. So one of the main ways that spark is used is as an ETL engine. So you can build these graphs of tasks in Spark, you can do the same thing with Dask, it was actually built originally more for the hard sciences and for scientific workloads. And not just for data science. Yeah, but dask is actually also used as a foundational layer for a number of the data orchestration tools out there. So dask is the foundational layer for prefect, you can use it as an execution substrate for the Daxter library, the Daxter framework and also supports, it's also supported in airflow as a execution layer. Then there are also a number of people who are using it as a replacement for things like Celery is just a means of running asynchronous tasks outside of the bounds of a request response cycle. So it's just growing a lot in the data ecosystem, both for data engineering and data science. And so just provides that unified layer of being able to build your data engineering workflows, and then hand that directly off into machine learning so that you don't have to jump between different systems. You can do it all in one layer. -00:35:12 Yeah, that's super neat and dask, I never really appreciated it sort of it's different levels at which you can use it, I guess I should say, you know, when I thought about it, okay, well, this is like parallel computing, for pandas, or NumPy, or something like that, right. But it's also it works well on just your single laptop, right? It'll let you run multi core +00:35:12 Yeah, that's super neat and dask, I never really appreciated it sort of it's different levels at which you can use it, I guess I should say, you know, when I thought about it, okay, well, this is like parallel computing, for Pandas, or NumPy, or something like that, right. But it's also it works well on just your single laptop, right? It'll let you run multi core -00:35:31 stuff locally, because Python doesn't always do that super well. And it'll even think it'll even do caching and stuff. So it can actually work with more data than you have Ram. Right? It's hard with just straight NumPy. But then, of course, you can point it at a cluster and go crazy. Exactly, yeah. And because of the fact that it has those transparent API layers for being able to swap out the upstream pandas with the dask, pandas library and NumPy. It's easy to go from working on your laptop to just changing an import statement. And now you're scaling out across a cluster of hundreds of machines? Yeah, that's pretty awesome. Actually, maybe that had some things as well to do with the batch to real time, right? If you've got to run it in one on one core on one machine, it's a batch job, if you can run it on the entire cluster at you know, that's sitting around idle while then all of a sudden, it's real time, right? Yeah, there's a lot of interesting real time stuff. There's a interesting project, sort of a side note here called wallaroo. that's built for building stateful Stream Processing jobs using Python. And interestingly, it's actually implemented in a language called pony. But how many? Yeah, hey, it's an interesting project, you know, levels up your ability to scale out the speed of execution, and the sort of just being able to build these complex pipelines, real time jobs, without having to build all of the foundational layers of it. Yeah. +00:35:31 stuff locally, because Python doesn't always do that super well. And it'll even think it'll even do caching and stuff. So it can actually work with more data than you have Ram. Right? It's hard with just straight NumPy. But then, of course, you can point it at a cluster and go crazy. Exactly, yeah. And because of the fact that it has those transparent API layers for being able to swap out the upstream pandas with the dask, pandas library and NumPy. It's easy to go from working on your laptop to just changing an import statement. And now you're scaling out across a cluster of hundreds of machines? Yeah, that's pretty awesome. Actually, maybe that had some things as well to do with the batch to real time, right? If you've got to run it in one on one core on one machine, it's a batch job, if you can run it on the entire cluster at you know, that's sitting around idle while then all of a sudden, it's real time, right? Yeah, there's a lot of interesting real time stuff. There's a interesting project, sort of a side note here called 'Wallaroo'. that's built for building stateful Stream Processing jobs using Python. And interestingly, it's actually implemented in a language called Pony. But how many? Yeah, hey, it's an interesting project, you know, levels up your ability to scale out the speed of execution, and the sort of just being able to build these complex pipelines, real time jobs, without having to build all of the foundational layers of it. Yeah. 00:36:54 Okay. And I see I have not heard of this one. That sounds fun. -00:36:57 Yeah, it's not as widely known. I interviewed the creator of it on the data engineering podcast A while back, but it's a tool that comes up every now and then interesting approach to it. Yeah. Right. In that stream processing real time world, right. The next one that you put on our list here is Milton and altano altana. I gotta say it, right. Yeah. Yeah. So that one is an interesting project. It came from the gait lab, folks, it's still supported by them. And in its earliest stage, they actually wanted it to be the full end to end solution for data analytics for startups. So Mel Tano is actually an acronym for if I can remember correctly, model, extract, load, transform, analyze, notebook and orchestrate. Okay. That's quite a wild one to put into. Yeah, some of you can say, well, exactly. And you know, about a year, year and a half ago, now, they actually decided that they were being a little too ambitious and trying to boil the ocean and scoped it down to doing the extract and load portions of the workflow really well, because it's a very underserved market, where you would think that, given the amount of data we're all working with, point to point data integration, and extract and load would be a solved problem, easy to do. But there's a lot of nuance to it. And there isn't really one easy thing to say, yes, that's the tool you want to use all the time. And so there are some paid options out there that are good. Mel Tano is aiming to be the default open source answer for data integration. And so it's building on top of the singer specification, which is sort of an ecosystem of libraries that was built by a company called stitch data. But the idea is that you have the what they call taps and targets where a tap will tap into a source system, pull data out of it, and then the targets will load that data into a target system. And they have this interoperable specification that's JSON based, so that you can just wire together any two taps and targets to be able to pull data from a source into a destination system with nice, yeah, it's definitely a well designed specification, a lot of people like it, there are some issues with the way that the ecosystem was sort of created and fostered. So there's a lot of uncertainty or like variability in terms of the quality of the implementations of these tabs and targets. And there was never really one cohesive answer to this is how you run these in a production context, partially because stitch data was the answer to that. So they wanted you to buy into this open source ecosystem, so that you would then use them as the actual execution layer. And so Mel Tano is working to build an open source option for you to be able to wire together these tabs and targets and be able to just have an easy out of the box data integration solution. So yeah, it's a small team from Git lab, but there's a large and growing community helping to support it and they've actually been doing a lot to help push forward the state of the art for the singer ecosystem, building things like a starter template for people building taps and targets so that there's a common baseline of quality built into these different elements. Without having to wonder about, you know, is this tab going to support all the features of the specification that I need? Nice. Is this actually from Git lab? Yeah. So it's sponsored by Git lab. It's the source code is within the Git lab organization on Git lab Comm. But it's definitely a very community driven project. Yeah. Stefan is a quite excited about the open source. And open source choice. Yeah, well, I think there's two things one open source is amazing. But two, you get this Paradox of Choice, right? It's like, well, it's great. You can have anything, but there's, there's so many things. And I'm new to Plato. I do, right. And so yeah, matana was trying to be the answer to you know, you just melt on o in it, you have a project, you say I want these sorts of sources and destinations. And then it will help you handle things like making sure that the jobs run on a schedule handling, tracking the state of the operations, because you can do either full extracts and loads every time or you can do incremental because you don't necessarily want to dump a 4 million line source table every single time it runs, you just want to pull the 15 lines that changed since the last operation. So it will help track that state for you. Oh, that's cool. And try to be real efficient, and exactly what it needs. And it builds in some of the monitoring information that you want to be able to see as far as like execution time performance of these jobs. In it actually out of the box, we'll use airflow as the orchestration engine for being able to manage these schedules. But everything is pluggable. So if you wanted to write your own implementation that will use Daxter as the orchestrator instead, then they'll do that there's actually a ticket in their tracker for doing that work, though. It's very pluggable, very flexible, but gives you a lot of out of the box answers to being able to just get something up and running quickly. And it looks like you can build custom loaders and custom extractors. So if you've got some internal API, that's who knows, maybe it's a soap XML endpoint or some random thing, right? You could do that. +00:36:57 Yeah, it's not as widely known. I interviewed the creator of it on the data engineering podcast A while back, but it's a tool that comes up every now and then interesting approach to it. Yeah. Right. In that stream processing real time world, right. The next one that you put on our list here is Meltano Meltano. I gotta say it, right. Yeah. Yeah. So that one is an interesting project. It came from the Git lab, folks, it's still supported by them. And in its earliest stage, they actually wanted it to be the full end to end solution for data analytics for startups. So Meltano is actually an acronym for if I can remember correctly, model, extract, load, transform, analyze, notebook and orchestrate. Okay. That's quite a wild one to put into. Yeah, some of you can say, well, exactly. And you know, about a year, year and a half ago, now, they actually decided that they were being a little too ambitious and trying to boil the ocean and scoped it down to doing the extract and load portions of the workflow really well, because it's a very underserved market, where you would think that, given the amount of data we're all working with, point to point data integration, and extract and load would be a solved problem, easy to do. But there's a lot of nuance to it. And there isn't really one easy thing to say, yes, that's the tool you want to use all the time. And so there are some paid options out there that are good. Mel Tano is aiming to be the default open source answer for data integration. And so it's building on top of the singer specification, which is sort of an ecosystem of libraries that was built by a company called stitch data. But the idea is that you have the what they call taps and targets where a tap will tap into a source system, pull data out of it, and then the targets will load that data into a target system. And they have this interoperable specification that's JSON based, so that you can just wire together any two taps and targets to be able to pull data from a source into a destination system with nice, yeah, it's definitely a well designed specification, a lot of people like it, there are some issues with the way that the ecosystem was sort of created and fostered. So there's a lot of uncertainty or like variability in terms of the quality of the implementations of these tabs and targets. And there was never really one cohesive answer to this is how you run these in a production context, partially because stitch data was the answer to that. So they wanted you to buy into this open source ecosystem, so that you would then use them as the actual execution layer. And so Meltano is working to build an open source option for you to be able to wire together these tabs and targets and be able to just have an easy out of the box data integration solution. So yeah, it's a small team from Git lab, but there's a large and growing community helping to support it and they've actually been doing a lot to help push forward the state of the art for the single ecosystem, building things like a starter template for people building taps and targets so that there's a common baseline of quality built into these different elements. Without having to wonder about, you know, is this tab going to support all the features of the specification that I need? Nice. Is this actually from Git lab? Yeah. So it's sponsored by Git lab. It's the source code is within the Git lab organization on 'Gitlab.com'. But it's definitely a very community driven project. Yeah. Stefan is a quite excited about the open source. And open source choice. Yeah, well, I think there's two things one open source is amazing. But two, you get this Paradox of Choice, right? It's like, well, it's great. You can have anything, but there's, there's so many things. And I'm new to Plato. I do, right. And so yeah, Meltano was trying to be the answer to you know, you just meltano on it, you have a project, you say I want these sorts of sources and destinations. And then it will help you handle things like making sure that the jobs run on a schedule handling, tracking the state of the operations, because you can do either full extracts and loads every time or you can do incremental because you don't necessarily want to dump a 4 million line source table every single time it runs, you just want to pull the 15 lines that changed since the last operation. So it will help track that state for you. Oh, that's cool. And try to be real efficient, and exactly what it needs. And it builds in some of the monitoring information that you want to be able to see as far as like execution time performance of these jobs. In it actually out of the box, we'll use airflow as the orchestration engine for being able to manage these schedules. But everything is pluggable. So if you wanted to write your own implementation that will use 'Dagster' as the orchestrator instead, then they'll do that there's actually a ticket in their tracker for doing that work, though. It's very pluggable, very flexible, but gives you a lot of out of the box answers to being able to just get something up and running quickly. And it looks like you can build custom loaders and custom extractors. So if you've got some internal API, that's who knows, maybe it's a soap XML endpoint or some random thing, right? You could do that. -00:41:51 Exactly. Yeah. And they actually lean on DBT and other tools that we were just talking about as the transformation layer. So they hook directly into that so that you can very easily do the extract and load and then jump into DBT for doing the transformations. Yeah. Now you didn't put this one on the list, but I do want to ask you about it. What's the story of something like Zapier in this hole to get notified about these changes pushed up here? It feels like if you are trying to wire things together, I've seen more than one Python developer reach for Zapier. Yeah. So Zapier is definitely a great platform, particularly for doing these event based workflows. You can use it as a data engineering tool if you want, but it's not really what it's designed for. It's more just for business automation aspects, or maybe automation of my application did this thing and now I want to have it replicate some of that state out to a third party system. Zapier isn't really meant for the sort of full scale data engineering workflows, maintaining visibility, it's more just for this event, Id IO kind of thing. Yeah. So here on the Montano, it says pipelines are code ready to be version controlled, and containerized and deployed continuously. The ci CD side sounds pretty interesting, right? Especially with these workflows that might be in flight changes. How does that work? You know, it's basically the point with Mel Tano is that everything is version didn't get. So that's another movement that's been happening in the data engineering ecosystem, where early on, a lot of the people coming to it were systems administrators, database administrators, maybe data scientists who had a lot of the domain knowledge, but not as much of the engineering expertise to be able to build these workflows in a highly engineered highly repeatable way. And the past few years has been seeing a lot of movement of moving to data Ops, and ml ops to make sure that all of these workflows are well engineered, well managed, you know, version controlled, tested. And so having this DevOps oriented approach to data integration is what Madonna was focusing on saying, all of your configuration, all of your workflows, it lives in git, you can run it through your ci CD pipeline to make sure that it's tested. And then when you deliver it, you know that you can trust that it's going to do what you want it to do, rather than I just pushed this config from my laptop, and hopefully it doesn't blow up. Right? It also sounds like there's a lot of interplay between these things like Mel Tano, might be leveraging airflow, and DBT. And maybe you want to test this through ci with great expectations before it goes through at CD side, like continuous deployment. Seems like there's just a lot of inner flow here. Definitely. And there have been a few times where I've been talking to people and they've asked me to kind of categorize different tools or like draw nice lines about what are the dividing layers of the different of the data stack? And it's not an easy answer, because so many of these tools fit into a lot of different boxes. So you know, spark is a streaming engine, but it's also an ELT tool. And, you know, Daxter is a data orchestration tool, but it can also be used for managing delivery of you can write it to do arbitrary tasks. So you can build up these chains of tasks. So if you wanted to use it for a CI CD, you could write what it's built for. But you know, and then different databases have been growing a lot of different capabilities where, you know, it used to be you had your SQL database, or you had your document database, or you had your graph database. And then you have things like Rango dB, which can be a graph database, and the document database and a SQL database all on the same engine. So there's a lot of multimodal databases, it's all of the SQL and all the no SQL all in one. Right. And you know, JSON is being pushed into relational databases and data warehouses. So it's, there's a lot of crossover between the different aspects of the data stack. +00:41:51 Exactly. Yeah. And they actually lean on DBT and other tools that we were just talking about as the transformation layer. So they hook directly into that so that you can very easily do the extract and load and then jump into DBT for doing the transformations. Yeah. Now you didn't put this one on the list, but I do want to ask you about it. What's the story of something like Zapier in this hole to get notified about these changes pushed up here? It feels like if you are trying to wire things together, I've seen more than one Python developer reach for Zapier. Yeah. So Zapier is definitely a great platform, particularly for doing these event based workflows. You can use it as a data engineering tool if you want, but it's not really what it's designed for. It's more just for business automation aspects, or maybe automation of my application did this thing and now I want to have it replicate some of that state out to a third party system. Zapier isn't really meant for the sort of full scale data engineering workflows, maintaining visibility, it's more just for this event, Id IO kind of thing. Yeah. So here on the Montano, it says pipelines are code ready to be version controlled, and containerized and deployed continuously. The CI/ CD side sounds pretty interesting, right? Especially with these workflows that might be in flight changes. How does that work? You know, it's basically the point with Meltano is that everything is version didn't get. So that's another movement that's been happening in the data engineering ecosystem, where early on, a lot of the people coming to it were systems administrators, database administrators, maybe data scientists who had a lot of the domain knowledge, but not as much of the engineering expertise to be able to build these workflows in a highly engineered highly repeatable way. And the past few years has been seeing a lot of movement of moving to data Ops, and ml ops to make sure that all of these workflows are well engineered, well managed, you know, version controlled, tested. And so having this DevOps oriented approach to data integration is what Meltano was focusing on saying, all of your configuration, all of your workflows, it lives in git, you can run it through your ci CD pipeline to make sure that it's tested. And then when you deliver it, you know that you can trust that it's going to do what you want it to do, rather than I just pushed this config from my laptop, and hopefully it doesn't blow up. Right? It also sounds like there's a lot of interplay between these things like Meltano, might be leveraging airflow, and DBT. And maybe you want to test this through CI with great expectations before it goes through at CD side, like continuous deployment. Seems like there's just a lot of inner flow here. Definitely. And there have been a few times where I've been talking to people and they've asked me to kind of categorize different tools or like draw nice lines about what are the dividing layers of the different of the data stack? And it's not an easy answer, because so many of these tools fit into a lot of different boxes. So you know, spark is a streaming engine, but it's also an ELT tool. And, you know, 'Dagster' is a data orchestration tool, but it can also be used for managing delivery of you can write it to do arbitrary tasks. So you can build up these chains of tasks. So if you wanted to use it for a CI CD, you could write what it's built for. But you know, and then different databases have been growing a lot of different capabilities where, you know, it used to be you had your SQL database, or you had your document database, or you had your graph database. And then you have things like Rango dB, which can be a graph database, and the document database and a SQL database all on the same engine. So there's a lot of multimodal databases, it's all of the SQL and all the no SQL all in one. Right. And you know, JSON is being pushed into relational databases and data warehouses. So it's, there's a lot of crossover between the different aspects of the data stack. 00:45:43 Yeah, there probably is more of that. I would say in this, like data warehousing stuff, you know, no operational database, it doesn't necessarily make a ton of sense to jam JSON blobs all over the place, you might as well just make tables and columns. Yep, no, it makes some sense, but not that much. But in this space, you might get a bunch of things, you don't really know what their shape is, or exactly, you're not ready to process it, you just want to save it and then try to deal with it later. So do you see more of that those kind of JSON columns or more no SQL stuff? -00:46:09 Absolutely. Basically, any data warehouse worth its salt these days has to have some sort of support for nested data, a lot of that, too comes out of the outgrowth of, you know, we had the first generation data warehouses, they did their thing, but they were difficult to scale. And they were very expensive. And you had to buy these beefy machines so that you were planning for the maximum capacity that you're going to have. And then came things like Hadoop where you said, Oh, you can scale out as much as you want, just add more machines, they're all commodity. And so that brought in the the area of the era of the data lake. And then things like s3 became inexpensive enough that you could put all of your data storage in s3, but then still use the rest of the Hadoop ecosystem for doing MapReduce jobs on that. And then that became the next generation data lake. And then things like presto came along, to be able to build a data warehouse interface on top of this distributed data and these various data sources. And then you had the, you know, dedicated data warehouses built for the cloud, where they were designed to be able to ingest data from s3, where you might have a lot of unstructured information. And then you can clean it up using things like DBT to build these transformations that have these nicely structured tables built off of this, you know, nested or messy data that you're pulling in from various data sources. +00:46:09 Absolutely. Basically, any data warehouse worth its all these days has to have some sort of support for nested data, a lot of that, too comes out of the outgrowth of, you know, we had the first generation data warehouses, they did their thing, but they were difficult to scale. And they were very expensive. And you had to buy these beefy machines so that you were planning for the maximum capacity that you're going to have. And then came things like Hadoop where you said, Oh, you can scale out as much as you want, just add more machines, they're all commodity. And so that brought in the the area of the era of the data lake. And then things like s3 became inexpensive enough that you could put all of your data storage in s3, but then still use the rest of the Hadoop ecosystem for doing MapReduce jobs on that. And then that became the next generation data lake. And then things like presto came along, to be able to build a data warehouse interface on top of this distributed data and these various data sources. And then you had the, you know, dedicated data warehouses built for the cloud, where they were designed to be able to ingest data from s3, where you might have a lot of unstructured information. And then you can clean it up using things like DBT to build these transformations that have these nicely structured tables built off of this, you know, nested or messy data that you're pulling in from various data sources. 00:47:23 Yeah, interesting. When you see the story of versioning of this, the data itself, I'm thinking, so I've got this huge pile of data I've built up. And we're using to drive these pipelines. But it seems like the kind of data that could change or I brought in a new source now that we've switched credit card providers, or we're now screen scraping extra data. Do you see anything interesting happening there? -00:47:45 Yeah, so there's definitely a lot of interesting stuff happening in the data versioning space. So I mean, one tool that was kind of early to the party is a platform called pachyderm, they're designed as a end to end solution built on top of Kubernetes for being able to do data science, and data engineering, and data versioning. So your code and your data all gets versions together, there's a system called Lake Fs, that is, was released recently that provides a git like workflow on top of your data that lives in s3. And so they act as a proxy to s3. But it lets you branch your data to say I want to bring in this new data source. And as long as everything is using like Fs as the interface, then your main branch won't see any of this new data source until you are happy with it. And then you can commit it and merge it back into the main branch, and then it becomes live. And so this is a way to be able to experiment with different processing workflows to say I want to try out this new transformation job or this new batch job, or I want to bring in this new data source, but I'm not quite confident about it yet. And so it brings in this versioning workflow. There's another system combination of tools called iceberg, which is a table format for use in these large scale data lakes data warehouses that hooks into things like spark and presto. And there's another company project called danesi, that is inspired by Git for being able to do this same type of branching and merging workflow for bringing in new data sources, or changing table schemas and things like that. +00:47:45 Yeah, so there's definitely a lot of interesting stuff happening in the data versioning space. So I mean, one tool that was kind of early to the party is a platform called 'Pachyderm', they're designed as a end to end solution built on top of Kubernetes for being able to do data science, and data engineering, and data versioning. So your code and your data all gets versions together, there's a system called 'LakeFS', that is, was released recently that provides a git like workflow on top of your data that lives in s3. And so they act as a proxy to s3. But it lets you branch your data to say I want to bring in this new data source. And as long as everything is using like Fs as the interface, then your main branch won't see any of this new data source until you are happy with it. And then you can commit it and merge it back into the main branch, and then it becomes live. And so this is a way to be able to experiment with different processing workflows to say I want to try out this new transformation job or this new batch job, or I want to bring in this new data source, but I'm not quite confident about it yet. And so it brings in this versioning workflow. There's another system combination of tools called iceberg, which is a table format for use in these large scale data lakes data warehouses that hooks into things like spark and presto. And there's another company project called Nesi, that is inspired by Git for being able to do this same type of branching and merging workflow for bringing in new data sources, or changing table schemas and things like that. 00:49:13 These all sound like such fun tools to learn, and they're all solving -00:49:16 painful problems, right. And then another one, actually, from the Python ecosystem is dBc, or data version control that's built for machine learning and data science workflows, that actually integrates with your source code management so that you get commit and git push, you know, there's some additional commands, but they're modeled after git, where you commit your code, and then you also push your data and it lives in s3, and it will version the data assets so that as you make different versions of your experiment with different versions of your data, it all lives together so that it's repeatable and easier for multiple data scientists or data engineers to be able to collaborate on it. Well, +00:49:16 painful problems, right. And then another one, actually, from the Python ecosystem is DVC, or data version control that's built for machine learning and data science workflows, that actually integrates with your source code management so that you get commit and git push, you know, there's some additional commands, but they're modeled after git, where you commit your code, and then you also push your data and it lives in s3, and it will version the data assets so that as you make different versions of your experiment with different versions of your data, it all lives together so that it's repeatable and easier for multiple data scientists or data engineers to be able to collaborate on it. Well, -00:49:53 yeah, the versioning the version control story around data has always been interesting, right? It's it's Super tricky. On one hand, your schemas might have to evolve over time. Like if you've got a sequel alchemy model trying to talk to a database. It really hates it if there's a mismatch at all right? And so you want those things to go the database schema maybe to change along with your code with like, migrations or something, but then the data itself. Yeah, that's tricky. +00:49:53 yeah, the versioning the version control story around data has always been interesting, right? It's it's Super tricky. On one hand, your schemas might have to evolve over time. Like if you've got a SQL alchemy model trying to talk to a database. It really hates it if there's a mismatch at all right? And so you want those things to go the database schema maybe to change along with your code with like, migrations or something, but then the data itself. Yeah, that's tricky. -00:50:19 Yeah. And so there's actually a tool called Avro and another one called parquet. Well, they're tools. They're data serialization formats. And everyone particular has a concept of schema evolution for, you know, what are compatible evolutions of a given schema. So each record in an Avro file has the schema co located with it. So it's kind of like a binary version of JSON, but the schema is embedded with it. Oh, okay. That's interesting. Yeah. So if you say, I want to change the type of this column from an int to a float, then you know, maybe that's a supported conversion. And so it will let you change the schema or add columns. But if you try to change the schema and a mean, in a method that is not backwards compatible, it will actually throw an error I see like a float to an end might drop data, but into a float probably wouldn't. Exactly. So it will let you evolve your schemas and parquet is actually built to be interoperable with Avro for being able to handle those schema evolutions as well, where Avro is a row or record oriented format. And parquet is column oriented, which is more powerful for being able to do aggregate analytics. And it's more efficient so that you're not pulling all of the data for every row, you're just pulling all of the data for a given column. So it's also more compressible. Yeah, I think I need to do more thinking to really fully grok, the column oriented data stores. Yeah, it's a different way of thinking. Yeah, the column oriented aspect is also a major revolution in how data warehousing has come about where, you know, the first generation was all built on the same databases that we were using for our application. So it was all row row oriented. And that was one of the inherent limits to how well they could scale their compute. Whereas all of the modern cloud data warehouses or all the modern, even non cloud data warehouses are column oriented. And so if you have, you know, one column that is street addresses, and another column that's integers, and another column that is, you know, vesicare, 15, all of those are the same data type. And so they can compress them down a lot more than if you have one row that is a street address, and a text field and an integer and a float and a JSON array. If you try to compress all of those together, they're not compatible data types. And so you have a lot more inefficiency in terms of how well you can compress it. And then also, as you're scanning, you know, a lot of analytics jobs are operating more on aggregates of information than on individual records. And so if you want to say, I want to find out what is the most common street name across all the street addresses that I have in my database, all I have to do is pull all the information out of that street address column, it's all co located on disk, so it's a faster seek time, and it's all compressed the same. And that way, you don't have to read all of the values for all of the rows to get all of the street addresses, which is what you would do in a relational database, +00:50:19 Yeah. And so there's actually a tool called 'Avro' and another one called 'Parquet'. Well, they're tools. They're data serialization formats. And Avro particular has a concept of schema evolution for, you know, what are compatible evolutions of a given schema. So each record in an Avro file has the schema co located with it. So it's kind of like a binary version of JSON, but the schema is embedded with it. Oh, okay. That's interesting. Yeah. So if you say, I want to change the type of this column from an int to a float, then you know, maybe that's a supported conversion. And so it will let you change the schema or add columns. But if you try to change the schema and a mean, in a method that is not backwards compatible, it will actually throw an error I see like a float to an end might drop data, but into a float probably wouldn't. Exactly. So it will let you evolve your schemas and parquet is actually built to be interoperable with Avro for being able to handle those schema evolutions as well, where Avro is a row or record oriented format. And parquet is column oriented, which is more powerful for being able to do aggregate analytics. And it's more efficient so that you're not pulling all of the data for every row, you're just pulling all of the data for a given column. So it's also more compressible. Yeah, I think I need to do more thinking to really fully grok, the column oriented data stores. Yeah, it's a different way of thinking. Yeah, the column oriented aspect is also a major revolution in how data warehousing has come about where, you know, the first generation was all built on the same databases that we were using for our application. So it was all row row oriented. And that was one of the inherent limits to how well they could scale their compute. Whereas all of the modern cloud data warehouses or all the modern, even non cloud data warehouses are column oriented. And so if you have, you know, one column that is street addresses, and another column that's integers, and another column that is, you know, Var Char, 15, all of those are the same data type. And so they can compress them down a lot more than if you have one row that is a street address, and a text field and an integer and a float and a JSON array. If you try to compress all of those together, they're not compatible data types. And so you have a lot more inefficiency in terms of how well you can compress it. And then also, as you're scanning, you know, a lot of analytics jobs are operating more on aggregates of information than on individual records. And so if you want to say, I want to find out what is the most common street name across all the street addresses that I have in my database, all I have to do is pull all the information out of that street address column, it's all co located on disk, so it's a faster seek time, and it's all compressed the same. And that way, you don't have to read all of the values for all of the rows to get all of the street addresses, which is what you would do in a relational database, -00:53:02 right? Because probably those are co located on disk by row. Whereas if you're going to ask so all about the streets across everyone, then it's better to put all the streets and then all the cities or whatever, right, exactly. Interesting. Cool. I +00:53:02 right, Because probably those are co located on disk by row. Whereas if you're going to ask so all about the streets across everyone, then it's better to put all the streets and then all the cities or whatever, right, exactly. Interesting. Cool. I -00:53:16 think I actually understand a little bit better now. Thanks. The final one that you put on the list that just maybe to put a pin in it as a very, very popular pandas. I've never I never cease to be amazed with what you can do with pandas. Yeah, so I mean, pandas. It's one of the most flexible tools in the Python toolbox. I've used it in web development contexts. I've used it for data engineering, or used it for data analysis. And it's definitely the Swiss Army Knife of data. So it's absolutely one of the more critical tools in the toolbox of anybody who's working with data, regardless of the context. And so it's absolutely no surprise that data engineers reach for it a lot as well. So pandas is supported natively, and things like Daxter, where it will, you know, give you a lot of rich metadata information about the column layouts and data distributions. But yeah, it's just absolutely indispensable. You know, it's been covered enough times in both your show and mine. We don't need to go too deep into it. But yeah, working with data, absolutely. get at least a little bit familiar with pandas. Well, just to give people a sense, like one of the things I learned yesterday, I think it was Chris Moffitt was showing off some things with pandas. And he's like, oh, over on this Wikipedia page, three fourths of the way down, there's a table. The table has a header that has a name. And you could just say, load HTML, give me the table called this as a DataFrame. From screen scraping as part of the page. It's amazing. Yeah, another interesting aspect of the pandas ecosystem is the pandas extension arrays library that lets you create plug ins for pandas to support custom data types. So I know that they have support for things like geo JSON, and IP addresses so that you can do more interesting things out of the box in terms of aggregates and group bys and things like that. So you know, if you have the IP address, pandas extension, then you can say gives me all of the rows that are grouped by this network prefix and things like that. Whereas just pandas out of the box will just treat it as an object. And so you have to do a lot more additional coding around it. And it's not as efficient. So there's an interesting interest. Yeah, that's it. That's as close to the pandas as well. Nice. One quick question. And then I think we should probably wrap this up. Stefan threw out some stuff about graph databases, particularly graph qL, or that's actually the API. Right? It's efficient. But what about its maturity? Like, what do you think about some of these new API endpoints? +00:53:16 think I actually understand a little bit better now. Thanks. The final one that you put on the list that just maybe to put a pin in it as a very, very popular pandas. I've never I never cease to be amazed with what you can do with pandas. Yeah, so I mean, pandas. It's one of the most flexible tools in the Python toolbox. I've used it in web development contexts. I've used it for data engineering, or used it for data analysis. And it's definitely the Swiss Army Knife of data. So it's absolutely one of the more critical tools in the toolbox of anybody who's working with data, regardless of the context. And so it's absolutely no surprise that data engineers reach for it a lot as well. So pandas is supported natively, and things like 'Dagster', where it will, you know, give you a lot of rich metadata information about the column layouts and data distributions. But yeah, it's just absolutely indispensable. You know, it's been covered enough times in both your show and mine. We don't need to go too deep into it. But yeah, working with data, absolutely. get at least a little bit familiar with pandas. Well, just to give people a sense, like one of the things I learned yesterday, I think it was Chris Moffitt was showing off some things with pandas. And he's like, oh, over on this Wikipedia page, three fourths of the way down, there's a table. The table has a header that has a name. And you could just say, load HTML, give me the table called this as a DataFrame. From screen scraping as part of the page. It's amazing. Yeah, another interesting aspect of the pandas ecosystem is the pandas extension arrays library that lets you create plug ins for pandas to support custom data types. So I know that they have support for things like Geo JSON, and IP addresses so that you can do more interesting things out of the box in terms of aggregates and group buys and things like that. So you know, if you have the IP address, pandas extension, then you can say gives me all of the rows that are grouped by this network prefix and things like that. Whereas just pandas out of the box will just treat it as an object. And so you have to do a lot more additional coding around it. And it's not as efficient. So there's an interesting interest. Yeah, that's it. That's as close to the pandas as well. Nice. One quick question. And then I think we should probably wrap this up. Stefan threw out some stuff about graph databases, particularly 'GraphQL', or that's actually the API. Right? It's efficient. But what about its maturity? Like, what do you think about some of these new API endpoints? -00:55:33 graph? qL is definitely gaining a lot of popularity. I mean, so as you mentioned, there's sometimes a little bit of confusion about they both have the word graph in the name. So graph qL, and graph dB. I've read it too quickly. Like, oh, yeah, no, like Neo for j, wait, no, it has nothing to do with that. Right. So you know, graph qL is definitely popular API design. interesting side note is that the guy who created Daxter is also one of the CO creators of graph QL. and daxter has a really nice web UI that comes out of the box that has a graph qL API to it, so that you can do things like trigger jobs or introspect information about the running system. Another interesting use case, or use of graph qL is there's a database engine called D graph that uses graph qL, as its query language, so it's native. It's a native graph, storage engine, it's scalable, horizontally distributable. And so you can actually model your data as a graph, and then query it using graph QL. So not only seeing a lot of interesting use cases within the data ecosystem as well, yeah. For the right type of data, a graph database seems like it would really light up the the speed of accessing, sir, absolutely, yeah. So the funny thing is, you have this concept of a relational database, but it's actually not very good at storing information about relationships. It is the joins make them so slow. And so Exactly. So lazy loading or whatever. Yeah, right. So graph databases are entirely optimized for storing information about relationships so that you can do things like network traversals, or understanding within this structure of relations, you know, things like social networks are kind of the natural example of a graph problem where I want to understand what are the degrees of separation between these people? So you know, the Six Degrees of Kevin Bacon kind of thing? Yeah. Yeah, seems like you could also model a lot of interesting things. Like, I don't know how real it is. But you know, the bananas are at the back, or the milk is at the back of the store. So you have to walk them all the way through the store. And you can find those kind of traversing those like, +00:55:33 GraphQL is definitely gaining a lot of popularity. I mean, so as you mentioned, there's sometimes a little bit of confusion about they both have the word graph in the name. So GraphQL, and GraphDB. I've read it too quickly. Like, oh, yeah, no, like Neo for j, wait, no, it has nothing to do with that. Right. So you know, GraphQL is definitely popular API design. interesting side note is that the guy who created 'Dagster' is also one of the CO creators of GraphQL. and 'Dagster' has a really nice web UI that comes out of the box that has a GraphQL API to it, so that you can do things like trigger jobs or introspect information about the running system. Another interesting use case, or use of GraphQL is there's a database engine called D graph that uses GraphQL, as its query language, so it's native. It's a native graph, storage engine, it's scalable, horizontally distributable. And so you can actually model your data as a graph, and then query it using GraphQL. So not only seeing a lot of interesting use cases within the data ecosystem as well, yeah. For the right type of data, a graph database seems like it would really light up the the speed of accessing, sir, absolutely, yeah. So the funny thing is, you have this concept of a relational database, but it's actually not very good at storing information about relationships. It is the joins make them so slow. And so Exactly. So lazy loading or whatever. Yeah, right. So graph databases are entirely optimized for storing information about relationships so that you can do things like network traversals, or understanding within this structure of relations, you know, things like social networks are kind of the natural example of a graph problem where I want to understand what are the degrees of separation between these people? So you know, the Six Degrees of Kevin Bacon kind of thing? Yeah. Yeah, seems like you could also model a lot of interesting things. Like, I don't know how real it is. But you know, the bananas are at the back, or the milk is at the back of the store. So you have to walk them all the way through the store. And you can find those kind of traversing those like, 00:57:26 relations, you know, the Traveling Salesman Problem, stuff like that. Yeah, yeah, exactly. All right. Well, so many tools, way more than five that we actually made our way through, but very, very interesting, because I think there's just so much out there. And it sounds like a really fun place to work like a -00:57:41 technical space to work. Absolutely. You know, a lot of these ideas also seem like they're probably really ripe for people who have programming skills and software engineering, mindsets, like C, ICD testing, and so on, absolutely come in and say I could make a huge impact we have this organization has tons of data, if people work with the data, but not in this formalized way. If people are interested in getting started with this kind of work, what would you recommend, there's actually one resource I'll recommend, I'll see if I can pick up the link after the show. There's a gentleman called named Jesse Davidson, who wrote a really great resource. That's a short ebook of kind of, you know, you think you might want to be a data engineer, here's a good way to understand if that's actually what you want to do. So I'll share that. But more broadly, if you're interested in data engineering, you know, the first step is, you know, just kind of start to take a look at it, you know, you probably have data problems in your applications that you're working with that maybe you're just using a sequence of celery jobs and hoping that they complete in the right order, you know, maybe take a look at something like Baxter a prefect to build a more structured graph of execution. If you don't want to go for a full fledged framework like that. There are also tools like bonobo that are just command line oriented, that help you build up that same structured graph of execution. So definitely to start to take a look and try and understand like, what are the data flows in your system, if you think about it more than just flows of logic and think about it and flows of data, then it starts to become a more natural space to solve it with some of these different tools and practices. So getting familiar with thinking about it in that way. Another really great book, if you're definitely interested in data engineering, and want to kind of get deep behind the scenes is designing data intensive applications. I read that book recently and learned a whole lot more than I thought I would about just the entire space of building applications oriented around data. So great resource there. Nice. We'll put those in the show notes. Yeah. And also just kind of raise your hand say to your management or your team to say, hey, it looks like we have some data problems. I'm interested in digging into it. And chances are they'll Welcome to help, you know, lots of great resources out there. If you want to get if you want to learn more about it, you know, shameless plug the data engineering podcast is one of them. Why should I be to help answer questions, I mean, basically just start to dig into the space. Take a look at some of the tools and frameworks and just try to implement them in your day to day work. A lot of data engineers come from software engineering backgrounds, a lot of data engineers might come from database administrator positions, because they're familiar with the problem domain of the data. And then it's a matter of learning the actual engineering aspects of it. A lot of people come from data analyst or data scientist backgrounds, where they actually decide that they enjoy working more with getting the data clean, and well managed than doing the actual analysis on it. So there's not really any one concrete background to come from, it's more just a matter of being interested in making the data reproducible, helping make it valuable, interesting note is that if you look at some of the statistics around it, there are actually more data engineering positions open, at least in the US than there are data scientist positions, because of the fact that is such a necessary step in the overall lifecycle of data. Yeah, How interesting. And probably traditionally, those might have been just merged together into one group, right in the category of data science, but now getting a little more fine grained. Exactly. And, you know, with the advent of data Ops, and ml Ops, a lot of organizations are understanding that this is actually a first class consideration that they need dedicated people to be able to help build. And it's not just something that they can throw on the plate of the person who's doing the data science. Yeah, certainly, if you can help organizations go from batch to real time or maybe shaky results, because the shaky input to solid results because a solid input, like those are extremely marketable skills. That's exactly All right. Well, Tobias, thanks so much for covering that. Before we get out of here, though. final two questions. So if you're going to write some Python code, what editor Do you use these days? So I've been using Emacs for a number of years now I've tried out things like pi charm and VS code here and there, but it just never feels quite right. Just because my fingers have gotten so used to Emacs. You just want to have an entire operating system as your editor, not just a software. It has that ml background with Lisp as its language, right. And then notable pipe UI package or packages. Yeah, we kind of touched on some right. Yeah, exactly. I mean, a lot of them in the list here. I'll just mention again, Daxter, DBT. And great expectations. Yeah, very nice. All right, but a call to action. You know, people feel excited this what what should they do? Listen to the data engineering podcast, listen to podcast thought in it if you want to understand a little bit more about the whole ecosystem, because since I do spend so much time in the data engineering space, I sometimes have crossover where if there's a data engineering tool that's implemented in Python, I'll have them on podcast on it, just to make sure that I can get everybody out there. And yeah, feel free to send questions my way all the information about the podcast in the show notes. And yeah, just be curious. +00:57:41 technical space to work. Absolutely. You know, a lot of these ideas also seem like they're probably really ripe for people who have programming skills and software engineering, mindsets, like CI/CD testing, and so on, absolutely come in and say I could make a huge impact we have this organization has tons of data, if people work with the data, but not in this formalized way. If people are interested in getting started with this kind of work, what would you recommend, there's actually one resource I'll recommend, I'll see if I can pick up the link after the show. There's a gentleman called named Jesse Davidson, who wrote a really great resource. That's a short ebook of kind of, you know, you think you might want to be a data engineer, here's a good way to understand if that's actually what you want to do. So I'll share that. But more broadly, if you're interested in data engineering, you know, the first step is, you know, just kind of start to take a look at it, you know, you probably have data problems in your applications that you're working with that maybe you're just using a sequence of celery jobs and hoping that they complete in the right order, you know, maybe take a look at something like Dagster a prefect to build a more structured graph of execution. If you don't want to go for a full fledged framework like that. There are also tools like 'Bonobo' that are just command line oriented, that help you build up that same structured graph of execution. So definitely to start to take a look and try and understand like, what are the data flows in your system, if you think about it more than just flows of logic and think about it and flows of data, then it starts to become a more natural space to solve it with some of these different tools and practices. So getting familiar with thinking about it in that way. Another really great book, if you're definitely interested in data engineering, and want to kind of get deep behind the scenes is 'Designing Data Intensive Applications'. I read that book recently and learned a whole lot more than I thought I would about just the entire space of building applications oriented around data. So great resource there. Nice. We'll put those in the show notes. Yeah. And also just kind of raise your hand say to your management or your team to say, hey, it looks like we have some data problems. I'm interested in digging into it. And chances are they'll Welcome to help, you know, lots of great resources out there. If you want to get if you want to learn more about it, you know, shameless plug the data engineering podcast is one of them. Why should I be to help answer questions, I mean, basically just start to dig into the space. Take a look at some of the tools and frameworks and just try to implement them in your day to day work. A lot of data engineers come from software engineering backgrounds, a lot of data engineers might come from database administrator positions, because they're familiar with the problem domain of the data. And then it's a matter of learning the actual engineering aspects of it. A lot of people come from data analyst or data scientist backgrounds, where they actually decide that they enjoy working more with getting the data clean, and well managed than doing the actual analysis on it. So there's not really any one concrete background to come from, it's more just a matter of being interested in making the data reproducible, helping make it valuable, interesting note is that if you look at some of the statistics around it, there are actually more data engineering positions open, at least in the US than there are data scientist positions, because of the fact that is such a necessary step in the overall lifecycle of data. Yeah, How interesting. And probably traditionally, those might have been just merged together into one group, right in the category of data science, but now getting a little more fine grained. Exactly. And, you know, with the advent of data Ops, and ML Ops, a lot of organizations are understanding that this is actually a first class consideration that they need dedicated people to be able to help build. And it's not just something that they can throw on the plate of the person who's doing the data science. Yeah, certainly, if you can help organizations go from batch to real time or maybe shaky results, because the shaky input to solid results because a solid input, like those are extremely marketable skills. That's exactly All right. Well, Tobias, thanks so much for covering that. Before we get out of here, though. final two questions. So if you're going to write some Python code, what editor Do you use these days? So I've been using Emacs for a number of years now I've tried out things like PyCharm and VS code here and there, but it just never feels quite right. Just because my fingers have gotten so used to Emacs. You just want to have an entire operating system as your editor, not just a software. It has that MLbackground with Lisp as its language, right. And then notable PyPI package or packages. Yeah, we kind of touched on some right. Yeah, exactly. I mean, a lot of them in the list here. I'll just mention again, 'Dagster', DBT. And great expectations. Yeah, very nice. All right, but a call to action. You know, people feel excited this what what should they do? Listen to the data engineering podcast, listen to podcast thought in it if you want to understand a little bit more about the whole ecosystem, because since I do spend so much time in the data engineering space, I sometimes have crossover where if there's a data engineering tool that's implemented in Python, I'll have them on podcast on it, just to make sure that I can get everybody out there. And yeah, feel free to send questions my way all the information about the podcast in the show notes. And yeah, just be curious. 01:02:38 Yeah, absolutely. Well, like I said, it looks like a really interesting and growing space that is got a lot of low hanging fruit. So it sounds like a lot of fun. Absolutely. Yeah. All right. Well, thanks for being here. And thanks, everyone, for listening. 01:02:49 Thanks for having me. -01:02:51 This has been another episode of talk Python. To me. Our guest in this episode was Tobias Macy. And it's been brought to you by data dog and retool. Data dog gives you visibility into the whole system running your code, visit talk python.fm slash data dog and see what you've been missing. But throw in a free t shirt with your free trial. supercharge your developers and power users. But then build and maintain their internal tools quickly and easily with retool, just visit talk python.fm slash retool and get started today, I want to level up your Python. We have one of the largest catalogs of Python video courses over at talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in sight. Check it out for yourself at training talk python.fm Be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top. You can also find the iTunes feed at slash iTunes, the Google Play feed at slash play and the direct RSS feed at slash RSS on talk python.fm. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talk python.fm slash YouTube. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code +01:02:51 This has been another episode of talk Pythonto me. Our guest in this episode was Tobias Macy. And it's been brought to you by 'Datadog' and Retool. 'Datadog' gives you visibility into the whole system running your code, visit 'talkpython.fm/datadog' and see what you've been missing. But throw in a free t shirt with your free trial. supercharge your developers and power users. But then build and maintain their internal tools quickly and easily with retool, just visit 'talkpython.fm/retool' and get started today, I want to level up your Python. We have one of the largest catalogs of Python video courses over at talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in sight. Check it out for yourself at 'training.talkpython.fm' Be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top. You can also find the iTunes feed at /iTunes, the Google Play feed at /play and the direct RSS feed at /RSS on talk python.fm. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code diff --git a/transcripts/315.txt b/transcripts/315.txt new file mode 100644 index 00000000..69ad28d4 --- /dev/null +++ b/transcripts/315.txt @@ -0,0 +1,69 @@ + + +00:00 Have you heard that Fast API is awesome. We have Michael Herman back on the show to help us make it even more awesome with his Fast API awesome list. He's categorized many extensions and other libraries working with Fast API to help you be even more efficient with this framework. This is talk Python to me, Episode 315, recorded April 22 2015. + +00:35 Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy, follow me on Twitter, where I'm '@mkennedy', and keep up with the show and listen to past episodes at talk python.fm. And follow the show on Twitter via @talkpython. Talk Python to me is partially supported by our training courses. At talk Python, we run a bunch of web apps and web API's. These power the training courses, as well as the mobile apps on iOS and Android. If I had to build these from scratch again, today, there's no doubt which framework I would use. It's Fast API. To me Fast API is the embodiment of Modern Python and Modern API's. You have beautiful usage of type annotations, you have model binding and validation with Pydantic and you have first class async and await support. If you're building or rebuilding a web app, you owe it to yourself to check out one of our newest courses, modern API's with fast API over at talk Python training. This is the first course in a series of fast API courses we're building and you can get it for just $39. It'll take you from interested to production with fast API to learn more and get started today. Just visit 'talkpython.fm/fast API', or email us at 'sales@talk python.fm'. + +01:48 Michael, welcome back to talk Python to me. Thank you. Yeah, it's really great to have you back. You've been on before. And there's a bunch of stuff that we're going to talk about that you've been up to, and we've done. So you were back on episode 206, when you talked about running Django in production, and you've had a couple of large sites that you've been running, and most recent one is probably 'testdriven.io', where you do some articles, some tutorials and some courses, right. Yeah. So life was obviously much different back in when I was no on the podcast back end. What was that February of 2019. But so now I'm still working on "testdriven.io". So that's a I do training courses for mid to senior level developers that are looking to learn test driven development, microservices, and AWS infrastructure and whatnot. Nice. It's mostly Python and JavaScript stuff. Is that right? Yeah, yeah, yeah. Well, maybe some of the same time Yeah, a bit. I also get a lot into like container orchestration and whatnot. Luckily, Kubernetes, a little bit of AWS ECS, as well. Awesome. And so what do you do now? So yeah, I am running tests from that IO on the side, but my full time main gig is for monitor. And we as the banner there says, we're doing machine learning assurance, really doing probably, machine learning, governance is probably more like what we had to redo the website, that's probably what we would establish machine learning governments would be there rather than assurance. But basically, we're helping to ensure that your AI is doing what it should be doing in production so that your models are basically inferences or predictions that your models are serving up are like within a certain bounds, essentially, right? Like if the input data changes, and you don't change your models, maybe they're not meant to understand that type of data. And they're just, you know, the thing is, you're always going to get an answer, right? Yeah, it just might be the wrong or invalid data, like, but it looks like a valid answer, like how do you know, right? Yeah, it's kind of like an Excel, like, whenever you get, you know, whatever it is, besides that, Na, or whatever, you know, you don't know that's, like good or bad or not. And so, yeah, we're looking for like feature drift. We're looking for model drift, we're looking for bias, you know, that sort of things. We're also we take all inferences that go through a model we're logging, and then we're also versioning, the model as well. So you can recreate the model run counterfactual type tests and whatnot. Interesting. Okay. Is that mostly Python stuff over there? What's the tech? Yeah, so it's a Django model. That's like the backend API. Using flask is sort of a middle layer to transform data between the API and the front end, the front end is in 'Vue'. And it's primarily AWS. And so using terraform as infrastructures code to simplify the maintaining of the infrastructure. That's cool. I feel like terraform is definitely taken off. I hear a lot of people saying that they're using it these days. Yeah, I really, is that like Ansible or sell chef, or it's like a competitor to that, right. Yeah, I think those are a little bit different. I would say it's more of a competitor to to like cloud formation. Okay, templates and cloud for I think there's a new sort of infrastructures code from AWS called CDK. That's a little bit more declarative, sort of how terraform is and I haven't had a chance to look at that, but I think that's + +05:00 We're like the new hotness these days. And then there's one more called like, Gloomy, I believe is the name, but I haven't used that one. Yeah, cool. People want to learn more about what's going on with monitor, they can check out Episode 261, which is a little newer until April 2020. With one of your co founders, Andrew Clark, as he's still working with you. Yeah, definitely. Yeah. Cool. So we dive into all that sort of stuff over there, which is cool. So really, the main thing that we're talking about is fast API, right? Like, that's what we're both here. We're both fans of fast API on multiple levels. And yeah, so we're gonna talk about the basically all the extensions in the ecosystem around fast API, but make the case for us for fast API itself. Like, I feel like there was a white 1000 flowers blooming type of thing going on once async and await came out and Python right, we had the Django the flask, old standbys. And then when things switched with type annotations, and with async, and await those frameworks could move super quick to adapt to those. So things like 'Fast API' and 'Sanic', and 'Japronto', and 'Starlet' and 'API star', all these things just sort of came into existence. And you know, Fast API is certainly among that crowd, right? Yeah, 'FastAPI' definitely leverages Starlet. And so yeah, you get the whole async syntax async await syntax. I would say that I like fast API for other things. Like I don't really take advantage of async and await and like the I have one production app. And that's running a fast API. And I can honestly care less about async await. But yeah, it's more about like the developer experience that I get. And also, I really, like pydantic. I would say that, yeah, if you're really into pydantic and that's like, what you use for serialization, deserialization, whatnot, then I would definitely check out as API. Absolutely. Just, you probably don't realize it because it's not very obvious. There's like you scoured the website, you might learn this. I just interviewed Samuel Coleman, who the creator and maintainer pydantic about pydantic and all the cool stuff it does. So Oh, cool. People who are listening to this prop may have just listened to that episode. And Fast API is a framework that absolutely takes Pydantic and puts it on the boundary. Right? Like it's when data is exchanged with fast API, the most default way to do that is to somehow involve Pydantic models, right? Yeah. And I mean, like, if you think back to like flask, flask is like just a wrapper on top of word swig and not pronouncing that right. But it works with click, and a couple others that are escaping me right now. The Fast API is really just a wrapper. It's dangerous. And like a couple other Yeah, yeah, it's just a wrapper on top of pydantic open API, JSON schema, that sort of things. Yeah, a few more like sort of modern tools like that. And so what makes fast API cool is that you can makes it easy to like hook into those two, so you can like leverage that and build plugins and whatnot. Yeah. So you can say things like, here's my API endpoint that I expect somebody to send a JSON document to. Oh, and the argument to the document is just a Pydantic model. Yeah, right. And then fast API will either successfully convert that model over or send something like a 400, or four to two back to say bad data, we can't convert it to what we're expecting. And on top of that, Fast API then automatically generates the open API stuff, which you hinted at, based on those models, right? Yeah. Yeah. So you get sort of that runtime type checking that you're talking about. And so you're really abstracting that out all to Pydantic. And so like, you don't have to maintain like a test suite around that either. And so it handles all of like the error handling around that it sends back a nice, human and computer readable response. Yeah, one of the things I like is if you've got like a nested object, maybe I've got a Pydantic model and has a list of little baby Pydantic models. And then there's an error, the error would say, the third thing in the list is where the problem is not just there are some invalid data. But on this field, you got a list, you've sent me a list of things. The third one, that's where that type conversion error is, it's like really good about giving you like, allowing you to drill out of where the data is wrong. Yeah. Okay, cool. So we've got the async and await stuff, which you said you don't use a lot. But if you need it, it's there. It's nice to have it right. It's really there. We're going to talk about some of the places that like plug into that in a moment. Pydantic i think is a big one. The Open API stuff is a big one. And to me, also, deployment is just simple, right? long as you've got 'Uvicorn' or 'gunicorn' plus uvicorn. You kind of go good to go right? There's not a lot of other stuff you got to do on the server, just 'gunicorn', fire this thing up and maybe point nginx put nginx in front of it. Yeah, yeah, definitely. I like how it doesn't have a development server, either. So you have to use it, too. Yeah, Uvi corn and development, I think that helps beginners that are new to sort of web development conceptually understand that, hey, this development server is there's a difference here between this development server in this production server, where as like with flask, and with Django, where they give you like a development WSGI server right out of the box. I think it's confusing because a lot of people are like, Well, why do I need that? Why do I + +10:00 Can I use this instead of Gunicorn? Exactly. And they always come with this warning. This is not a production server, please don't use this. And you're like, Okay, well, yeah. What if I want to test it? Like for performance? Is it kind of like what I would expect? Is it really different? Like, there's all these things just like just make it run? What you're going to run in production anyway. Right? Yeah, yeah, definitely. Yeah. Pretty cool. Okay. Well, I think that's a pretty solid case. For fast API, I do think it's probably its biggest parallel competitor type thing is probably flask. You know, a lot of times people are using Django, they're trying to do a little bit more than maybe what they're doing flask. flask has many, many plugins as well. So I don't know, what do you think about this? Do you see these working together? would you use one instead of the other? It's interesting, because I think last week, you know, flask announced their 2.0. Yes, they did. But another 2.0 is going to have some async and await support. And that'll be interesting. Because like the entire class, part of the power behind flask is the ecosystem, like you have just Yeah, 1000s of plugins, some might argue that's like, there's too many plugins. But regardless, there's a lot of different plugins there. And so are they all going to start migrating over in supporting async? and await? What are they going to do? So I think that'll be interesting. But I would say there's obviously a lot of comparison type articles between flask and fast API. I don't know. Like, I try not to get into the either or type thinking and try and think of an in both like, they're both tools. I use all three. I like fast API for certain things. I use Django for other things. I use flask for other things. And so it's just all about like the tool having the right tool for the job. Yeah, yeah, I would say that fast API is more similar to flask than obviously is Django. And so it's probably going to be comparable to flask. But I still think that there's like certain reasons that I would probably use flask over fast API. Sure. That may well be because one of the plugins that you're talking about exactly nailed the use case. Yeah. And I'm also going to have David and Phil on to talk about the flask 2.0 release pretty soon. So Oh, cool. That's I think there's been this sort of like leapfrog thing is probably going to put flask back, if not ahead of fast API and some of the cool features, it'll bring a lot of those features over, I would expect, it's also worth pointing out that like, in flask, you can still do cool stuff with say Pydantic, right, you just need one extra line of code, the very first line of your API method could just be "model=pydantic.thing**request.getform", or whatever the call is to get the form data that's been posted over get the JSON data that's been posted over, and then just run with it the same. Yeah, yeah, there's a nice like flask Pydantic plugin as well. So yeah, you can definitely use Pydantic inside of flask. And then there's also like 'Flask-RestX', which will give you nice 'OpenAPI/swagger' type support as well. So if you want that out of the box, interesting, okay. And a lot of this is awareness, right? Like knowing Oh, there's this thing that I can go get right, knowing that I could get flask Pydantic, or the Open API, so on. And that brings us to our main topic here. How do you discover these things and know that they're out there? Right? It turns out, there's actually a bunch of extension type libraries for Fast API. I don't know, there's an official login model for fast API, but certainly things built to make Fast API better and add functionality to it right. And so I learned about a lot of them from your awesome fast API lists, but maybe just for a minute, for people who are not aware of this whole trend of awesome lists, maybe just talk about awesome lists for us a little bit like, What's the story? Where did these things come from? Yeah, I can't remember exactly. When they started popping up. I feel like maybe like five years ago, something like that. But I mean, awesome list is, I guess, just literally just a list of awesome things. In theory, it's contributed, maintained by the community. I mean, oftentimes, what really happens is like the main author becomes like sort of a dictator around that, which I think is totally fine. But I think there's just there's 1000s of different awesome lists out there. And so it's just, you're like a meta, meta Awesome, awesome list of awesome lists. + +13:55 So it's, it gets a little crazy. He does get super, super meta, and it gets very specific as well. The one that probably people know best in the Python space is "awesome-python.com" which is really General, right? I mean, there's flask stuff and fast API stuff. There's zillions of other areas. And then there's all these offshoots, like yours around fast API and some for flask and so on. Cool. So when did you create your awesome fast API? It can't be more than two years ago, right? Because I'm saying on fast API's, it must be like a year old. I would say, I can't remember when we launched the core, the fast API course on test driven, but it's like, right around the same time. I think the list probably actually came out before that, because the first blog posts on fast API came out in January of 2020. So it was definitely after that. Yeah, really cool. Well, and I think, you know, it's fast API being around two years old. It takes a little while for these extra libraries to build up around it. Right. And I definitely think it's gaining momentum. It was for the first time it showed up on I think it was the PSF Jeff + +15:00 survey for 2020. And went straight to number three is the most popular web framework. It was really yeah, quite interesting how popular Yeah, how quickly it became popular. To me, it's because it brings together these little pieces, each one that is kind of a neat new modern Python idea, like type hints, like Pydantic, like async, and await all those things and just bring it all together in one go. Yeah, this has all the things that I care about. This is cool. Alright, so your awesome list is broken down into a bunch of different categories, like third party extensions, and then resources, including, say, a podcast episode like this one, and so on hosting, and so on. So what I thought it'd be fun to do is let's just maybe go through some of those sections, and highlight a couple of the tools or extensions or whatever, that are really neat. Yeah, that sounds great. How about the fast API admin for the first one, one of the reasons why I did create this is like, mainly for these third party extensions for like listing these. So I think it's kind of 'Django packages.org' really aggregates them really well, for you know, obviously for Django, I don't think there is one anymore for flask, the main flask Doc's used to have a list, but it just got out of control. Yeah, but I think one of the powerful things is definitely the ecosystem and like these extensions, so I thought, like, someone's got to, like, start this so fantastic. Alright, so the first one, I think, is pretty interesting. Is this fast API admin. Tell us about this? Yeah. So if you I mean, if you're familiar with Django, like one of the powerful things that you get from Django, right out of the box is a nice cred admin, where you can interact with your models, your database models, in sort of like a cred GUI like fashion. And so you can add data all have sort of the crud functionality right out of the box, so you don't have to jump into SQL. So yeah, this just like mimics where that same sort of behavior. Very cool, very cool. GUI. That's one of the main features of Django that I would see people using it for it. Yeah. So the idea is basically, if I want to create users or mess with other tables, like, do they have products and categories, and it's just like a grid type thing to add new ones, edit existing ones, right? Yeah, yeah. It's very straightforward. But yeah, it's not like, there's not really too much to say about it's not some super sexy interface or anything like that. But it is like, it just saves a lot of time. Yeah, like one of the main features of the newer Django3.2 maybe was that there's now templating, for its admin stuff to make it look all super cool. Yeah. But to be honest, I think one of the really important things is to make it easy for people to get started. And you know, for me, if I had to build a little admin back end thing like that, it's like, Alright, well, there's half a day. Gotta do that to add it. But if you're starting new, and you're like, Okay, I got this page showing, but now I need to edit them. Oh, no, like, this is gonna be such a pain, right? Having something like this, you can just plug in, it'd be really helpful to say, Oh, actually, maybe I will choose fast API over Django rest framework, for example, potentially, because it might have an admin, right? Yeah. Like one of the negatives about this. And this is one of the negatives with Django is a lot of people use it for stuff that is not intended to be used for like they try and create like you make a consumer facing version of it. I think that's partly the impetus for like the templates is to be able to do more stuff like that, which I think that is not, then you have to make everyone sort of an admin. I think that's a bad practice. Yeah, it might lead to some problems. Yeah, yeah. + +18:17 Yeah, Joe, out there on the live stream says, Does this include identity? Like, can I get a I'm guessing, a restricted admin back end? Yeah, I'm not sure. We are going to talk about other things that do include identity. And I suspect, I haven't logged in, but there's probably some point where you write some bit of API endpoint, maybe I know, there's got to be a way where you restrict access to it. Yeah, it does have admin secret here. And maybe that's it. I mean, if it works, I haven't used this particular one. I don't use orders and production. Last I looked at this, it didn't work with any other ORM s besides tortoise, but I know like that, if it's built similar to the or designed like the DjangoArm on Django on, doesn't have like a concept of permissions. So it's just it has like a super user. And if you're super user access, you get access to it. So there's not really any. I think there's probably maybe some extensions you can use to like limit access. But yeah, yeah, it looks like permission. True. Maybe there is like a concept of permissions here. So I'm thinking there's two places where you can put a little bit of protection here. One is this where you say the 'URL /admin'. Do URL could be /UUID out of some insane length. Right? So it's not very guessable. Yeah. And then this app secret, I suspect is like, What is the secret? You know, Knock knock, what's the magic word sort of thing? And there's just one login for it. But for the right type of app, that's probably good enough. Yeah, you would definitely want to change that URL in production. You don't want a Ford/admin. People are just always poking for stuff like that. Yeah, I don't know if you've ever Yeah, I'm sure you've done it. And there are no people out there in the audience have ever done this. But if you haven't, it's shocking. If you go and you just tell the log of your site like the main request log, you'll just see like request + +20:00 For all sorts of like, weird, unrelated stuff, people just trying to guess to see if it exists. Like, there's all sorts of requests for like WP admin. Yeah, '.php' on my site and is written in Python. It has no admin thing like that whatsoever. But people are just going, is there this type of Can I get and try a default password against a PHP admin back end? Or try that for a Joomla back end and just start jamming on those things? Right. And so having that URL, something that's just not default, was not a big piece of security. It dissuades the bots, I think, yeah. Yeah. Keeps the honest people honest. Exactly. You want to hack, you got to do it for real. Okay. Speaking let's let's talk about authentication. There's a whole section on different types of things here, right. Yeah. So the one that I like best, or the one that I've used the most, I guess I'd say FastAPI users. Okay. I think that's the most popular one. Yeah, yeah, this is a So it uses all like it uses job based off but you can also sort of like you can store your job wherever you want. But this also has like a session capability built into it where you can store Django cookie, okay. There's a lot of times people are storing jots and local storage, which could lead to excess typing attacks are like open you up to like some Yeah, x assessor cross a scripting or get those mixed up. But yeah, yeah, I thought that was cool. You don't see that a lot. And like, sort of like libraries that are like job based. Yeah, that's cool. Yeah, it has either a cookie off or a dot off back end, which is cool. It also supports different ORMs. So if you'd like SQL Alchemy, you can do that, if you'd like MongoDB can do that. Ortis or ORMR, which I haven't actually heard of before, but we'll probably talk about it in a minute anyway. But yeah, quite neat. And also OAuth too. Yeah. So this is quite neat. If you want to plug it in, it looks like a bunch of people have contributed to it. So it looks pretty lively. Yeah, it's very popular. Yeah. I mean, one thing people might want to check out is the sneak package advisor. Have you seen this? Over here? And you can put all sorts of different packages in here. So if you like this one, or that one, you could actually pull it up? Have it tell you maybe not always that quickly? The health of the human, like, how popular is it? how healthy is its maintenance, and the security and sustainability and so on. So if you're like, Oh, I got these two or three extensions that might be doing the same thing. It might be worth throwing into this sneak package advisor and get a sense for like, this one seems a little stronger terms of liveliness, is the popularity score, is it based on similar packages? Or just like kind of global? Yeah, that's a good question. Because I can't imagine that there would be a more popular off library for fast API. I think it's global. I'm pretty sure it's like a logarithmic global type thing. Because, yeah, right. This is probably the most popular one. So what's going to beat it right? But it's only 798 stars, relative to like, flask users or something like that, right? Yeah. Yeah, I'd have to look and actually see, it does, like show you similar packages, if you can find sometimes it'll say, here's some other stuff that's like it. But anyway, I think that might be something to like, bring together with just awesome lists in general. Right? That's give me a chance to check these out and see how they're doing. You've also got flask. Fast API login based on flask login. They're both too similar in both words in the same place. Yeah. Yeah. Cloud auth, which is kind of cool. So like using authzero, or AWS cognito, or things like that. If you're doing that anyway, that might be nice. Yeah. Auth zero that just got bought by somebody. I feel like Google or something like that. Facebook. Yeah. I think they might have. Yeah, I feel like they bought something else. And then, you know, little fish, big fish whale, + +23:40 I think often just bought Yes, that's right. It was Octa. Yes. Right. Yeah. Another one I wanted to throw out there while we're in this section. But I don't see here. And I don't see anything that does this. Possibly. Let's look at fast API security. Maybe this one does. It's not super documented over there. I feel like this throws some of the permissions in there that you would really want but a place called like 'OpenAPI' type stuff. More like the general Oh OS, OWASP stuff, if you check out secure and just how did they get this as the modern PyPI package name, just the word secure. But this one, it doesn't say that it plugs in with fast API, but it plugs in with flask and Django and Pyramid and CT and responder in starlet and starlett is the foundation. But I think that it would actually and what it does is it does things like it sets all the default header behaviors like that the phrase, you can't embed somebody else's your site into somebody else's site. And that cross site scripting protection is that and certain types of cache policies and whatnot. And just by doing like one line of code over on against it, or one or two lines, it'll automatically just like wrap every request with one of those. So that might also be interesting to think about in this regard. Yeah, those headers can get confusing. + +25:00 If you especially if you're using like a single page application, you have to, if you're trying to use cookies or some sort of like, you know, session based off or whatever at the same time. Yeah, like it can get pretty complex on those headers. So yeah, that seems like a cool package. Yeah. And if something new comes out, that should be added that you don't pay attention to. But you happen to upgrade your package. Maybe it'll bring like the new best practice along Right. Yeah. Okay, so that's authentication databases. You have it broken down into ORMs. Query builders, ODM, which is for document databases, but it looks like a quick scan that document== MongoDB for the moment here. Yeah. And then there's other, which like JSON exchange and whatnot. Let's talk. Let's talk ORM first. Big like you pointed out big news for SQL alchemy, right? Yeah. SQL alchemy 1.4 newsletter last week or the week before the change actually broke fast API, or actually broke the flask course. because it requires the database URL to have PostgreSQL as the name rather than just Postgres. And so, you know, that broke the course. So little little annoying, but you know, yeah, and it blows my mind as to it broke one of my fastapi courses, because we were using async. We were using the new one, the async one, but then they said, well, you're using a driver that doesn't support async. So we're gonna throw an exception instead of just work more slowly. And so yeah, now again, you had to put like a separate different driver in there and so on. So yeah, yeah. So we find out sometimes the hard way about these releases, but yeah, so what's the big deal 'SQL Alchemy'? And then we got the SQLAlchemy in here. Yeah, let's see. So I mean, I haven't updated this like for specifically for like anything new with SQL alchemy. So yeah, there's that fast API SQL alchemy? That'd be interesting. And take a look at that one to see if I have there been any updates since SQL alchemy new one has been released? It looks like no. Looks like it hasn't been touched. Yeah. Yeah. So maybe if you're looking to contribute, maybe there's something to be done here. Because SQL Alchemy used to not support async. The big thing with a SQL Alchemy 1.4 there's stuff which changes the API to sort of move to something new. But the big one that's probably relevant here is SQL alchemy. Now as of 1.4, supports async and await like async, await, execute this query type of thing. Give me the objects back. And you might not care at all about that. But if you're using async, and await view, or API endpoints, invest API and but you want to use sequel alchemy, well, there goes your async. Right, it's gone because one of the most important things to await upon is the database. And so now the new SQL alchemy has that support, and I'm guessing this one, it probably doesn't support that, right, because it just it didn't exist. But yeah, so cool. Yeah, I'm not exactly sure. I'm sure it won't work, because the sample right here is exactly the same code I had that broke. So it's not gonna work. But it also is very possible to be quickly and easily updatable. I don't know, but people can check that out. I guess another one you've mentioned earlier, is tortoise orem. I haven't really done anything with tortoise was a story like databases are supposed to be fast. And here's the turtle going on. + +27:58 Yeah, I mean, naming is tough. Right? + +28:00 Exactly. I would say like tortoise is probably the most popular one or like just the one that I've like seen the most use out of in terms of seeing a lot of fast API projects. And so yeah, I think tortoise is one that's like leveraged a lot. Honestly, people use SQL alchemy, probably more than anything else. And they just like, don't deal with async type stuff. But if you want async and await, then yeah, I mean, tortoise definitely has quite a bit of support there. Yeah, you've got Pony ORM Ortis. ORM. Kiwi. I know peewee has an async version. I know the Django people are working on an async ORM story, but it's not there yet. SQL Alchemy just got it. One things I like about this is they put just what seems like a pretty fair graph comparing all the different ORM's in terms of performance. And tortoise orm comes up pretty nice on like, single inserts, and the whole updates. But there's also places where it's slower than other stuff. And they're just like, we're gonna put it all up there. And if you find it useful, here's what you get. And tortoise is similar to the Django ORM in the sense that it uses an active record type approach more similar to peewee whereas SQLalchemy uses the scammy data mapper approach, I believe, yeah, the unit of work style. Yeah, yeah. Yeah. And that's confusing for people because you've got to create the session, then you do all this stuff. And then you call commit, if you want to make changes, and it's not that hard to deal with. But it is one of those things just like why, where do I get a hold of the session? If I got a hold of objects, and I want to update it, like, I do find the Active Record stuff for simple cases. Pretty handy. Right. And it comes from Django as well. And that style. Yeah. So it'd be felt real familiar. Yeah. Yeah. Pretty cool. Joe, also out there says, What's the most popular database to use with fast API Postgres Express JS. Mongo, maybe is, like another example of a pairing? Yeah, I would say Postgres. Yeah, I think generally speaking, you're talking Python. It's Postgres. If you're relational MongoDB. If you're not, that seems to be like the story. The important thing here is that you may care about async and await. So then that limits the way in which you can talk to the things but both Postgres and + +30:00 MongoDB have really good async story. So I think that still holds for fast API, the SQL lite have any sort of like, I don't know if you know this offhand, if SQL lite has any sort of async support, it doesn't I don't think the real way you access it has important behaviors that are any different. But there's a new driver that allows it to integrate with an async event loop async IO event loop. So it won't block the loop. But basically, it just means there's another thread waiting. And you won't get like true scalability, but at least like you're waiting on a query here and an external API there. They can both happen at the same time. But if you have a bunch of database calls, I think it just queues up. Yeah. Cool. All right. So tortoise ORM. useful, because down here somewhere, we probably have an await, I'm guessing. Oh, here's the you asked about the sequel, I here's the SQL library. aIIow. SQL lite is the one like there was recommending? Yeah. But down here, you can do you know, await, create object await filter object that first exactly what you want for simple access to those API's. Yeah, I mean, if you're just doing like a Quick Select or delete, or whatever it is, you're probably not going to get any sort of performance boost. But if you're doing for whatever reason, if you're doing a lot of different queries in a route handler that are not dependent on each other, or if you're like, doing more than one query that is very expensive. Yeah, that's also not dependent on each other, then you might get a performance boost like this, basically, the worse your databases, the better + +31:26 the more of a benefit you're going to get. Because they think in a way, it's all about scaling, waiting in latency. So the more latency your database has, because you're doing slow queries, or it's like cloud and faraway or whatever, you're going to benefit from that. Yeah. But if it's a one millisecond response time, who knows? All right. You also mentioned GINO gino, this is like literally the first time I'm seeing it. What is this? Yeah, so Gino is from what I understand. It's more of like a full ecosystem. Okay, so Geno's, not ORM. is lightweight, Asynchronous ORM. So But yeah, I mean, it is an ORM. But it also like I believe there's a generator in here where you can build like a scaffold out an entire, like fast API app, sort of like based on your models. Yeah. Interesting. Similar to like how the rails like skeleton command works. And you're familiar with Yeah, that, like scaffold out like a quick crud app. Okay, cool. And it's one of these recursive acronyms. Yeah, it is. It talks about having an async API on top of SQL alchemy. I wonder how much importance that will have these days. But yeah, it sounds like it has other stuff. Like he talked about the whole community and stuff. So very cool. What else is in here that we should talk about? There's a couple other ORM's, people can look at query builders. I've used databases quite a bit. Okay. One difficult thing about databases is googling. like trying to Google any sort of like errors around databases. It's, you know, like it was what's gonna come up, but yeah, I mean, this is like one that's with this originally done by Kenneth rights and then hand it off, or is this different? You know, I'm not sure. I think it's by the same team that does a starlet. Yeah, yeah. Yeah. Tom Christie that's escapes me there. Yeah, yeah. Yeah, definitely. Cool. Now, he does a lot of stuff. Yeah, that guy's busy. I've had him on the show before. He's definitely a busy guy. That's awesome. Cool. So it's like a wrapper around SQL alchemy core that then will give you back proper queries and stuff like that. Yeah. Nice. Okay. odms. Just really quick mentions, as I was recently talking about beanie on Python bytes, beanie is kind of cool. It's an async way to talk to MongoDB and ODM object document mapper, because there's no relations, really. But what's interesting about that is is one has an async option. And two is all about Pydantic. So you're normally you'd have like Django models or SQL alchemy models, like the models are Pydantic models that go to and from the database, which has an interesting possibility for integration back into the like return types and stuff that you would have say for a fast API. Anyway, that's, if you're into it that's potentially worth checking out. It's based on motor, which is the MongoDB, official racing driver for Python. Yeah, the PR for that one came out, I think, last week, and I heard about it maybe the week before. So I don't know if that's like a new library beanie. But it's Yeah, it's quite new. Because I was talking to Roman is the guy who created this, and he wouldn't, he did some stuff on it. I talked about how I thought was really cool to ease and Pydantic models and types, but it needs indexes. And so he actually went and put a whole mechanism in for doing indexes. Gotta check them out. I guess that's just an example. But whole way in which you can put indexes in your models and stuff. Yeah. MongoDB indexes. Perfect. So pretty cool that it has that stuff built in. And yeah, it looks like it's coming on strong. So maybe that's a good one. Mongo engine is actually what I use over at talk Python, but I don't believe it's async. I have suspicions that will never be async. It's super, super involved, and it doesn't seem to be changing real quickly. So I'm guessing that that's kind of where it is. Have you checked out this PydanticSQL alchemy pydantic to SQL alchemy, I haven't no booze to convert SQL alchemy models to Pydantic models like I just talked about the benefits of having your database models beat Pydanic so they can be on the ground. + +35:01 Right, yeah. And maybe this is just like a way to sort of map those over. That's kind of cool. Yeah, that is cool. I did when I looked at this earlier, one thing I wasn't psyched about, I'd be interested to hear your thoughts on this, Michael, the way that it does is that runtime, right? So the way you get your SQL alchemy model, you say, now I'm going to create a Pydantic user from my user, by passing it to a function that does probably magic metaprogramming, and boom, outcomes, this type identic user, right, and that's cool. But one thing I don't like about that style is the editors can't be very smart about helping you, right? Like, it doesn't know what the hecka identic user can do or can be or autocomplete, and all that sort of stuff. So I feel like a lot of these sort of runtime converter things. I don't know, they let the good editors like PyCharm in VS code down. What do you think? Yeah, I mean, is there a way to look at what it'll convert that down to? It'd be great if you could get it to like, spit out? Yeah, the actual model, a code file, you run this one time? And then you're like, we're good. Yeah, we're good. I just want to Yeah, way to go from 20 SQL alchemy models to 20 pydantic models, and I save that code and go with it or something like that would be cool. But I didn't know. So it must be cached. Sort of at some point. So yeah, yeah. That's interesting. Yeah, I do think it's really powerful, though. But it's just like, yeah, it's a little tricky to get help with what you're supposed to do, right? Because otherwise, you have two different like model concepts. Like you have your model like your database model, which is SQL alchemy. And then you have like, throw that your model, which is more of like a schema. Yeah. But pydantic calls the models. And so like, in my fast API code, I call them schemas. So I have like a schema.py file. Now the models.py file, which is the flask, sorry, I keep saying flask. But the + +36:40 theory it's so hard, because they're they're the same size word. And they play a very similar role. Yeah, yeah. Yeah. + +36:47 At least sponsoring this. + +36:49 Yeah, just having them combined together, like would make sense. But yeah, I get what you're saying about like, you don't know exactly what that is like until like at runtime. Yeah. If I go to one of those and say, dot, it's like, it gives me nothing, right? Yeah. Yeah. Cool. Yeah. So Joe, out there in the livestream says, what would you recommend as a DB driver said, a psycopg 2 for Postgres and fast API, you got a recommendation? Well, like if you're using synchronous, you want to go 'psycoPG2', obviously. But if you're using async, then it's gonna use asyncpg. Yeah. So there's a separate async one. Cool. Alright, let's talk about that. I just bag on code generators. Let's talk about code generators. + +37:40 Okay, because they are valuable. One of the things that's cool. Like here, I have a here's a little site I built for a course whether or not talk python.fm that lets you go and enter a city. And it'll tell you literally the real weather right now like for now, in Portland, it's broken clouds, within like a few moments, right? That's cool. And one of the things that's really nice though, with all these things are you can go to '/docs', and you get this cool documentation thing. And that's is where, like your Pydantic model kicks in for the JSON schema response and all those cool things. Right. So that's just fast API. That's amazing. But the next thing is the fast API code generator, right? So this takes an open API file, and then it will scaffold out your fast API app based on that. And so does it do all like the Pydantic models and whatnot? I'm not sure if it does, or is it just to the endpoints? It seems like it it says the response models pets, I think so from models import pets. Yeah, yeah. So you pointed at that documentation, that slash docs that I was just talking about? And you say, here's some API specifications, I want to fast API server implementation of that, boom, and you get it right. So you can sort of do like the document driven development. So then if you want to change your API, you change the open API Doc, exactly. Yeah, you can see right here it is actually generating the Pydantic models, and even the errors that it throws and so on, this is super cool. It even has support for, say, optional strings and default values versus non optional strings. Like it was cool. I'm not sure I would use it, because I don't find myself in a situation where I have that a lot. But if you wanted to do it, it seems like it would be really cool to have. Alright, related to that is the other side of the story. I want to talk to an existing fast API. Probably this would work for any open API thing, I would guess. But it's called it says it's for fast API. So generate a MyPy and Id friendly API client from open API spec. This looks cool. Let's see some examples down here. A both supports async and synchronous clients. And I'm guessing it's using the similar example model, because we have a pep here as well. So it generates the model, the Pydantic model that you're gonna exchange with the server and all sorts of stuff. This is cool. So what does this generate? Exactly? It's creating a client library to interact with the API. Yeah, like so instead of using requests to call the API and just putting bear dictionaries, yeah, it will actually generate the Pydantic models. And it'll give you the API endpoints as functions instead of just request this URL. Nice. Does that give you like swings, documentation and all that too? I have no idea. + +39:54 That would be cool. Yeah, super cool. That shows you how to basically take all that stuff and generate those and it does say that it + +40:00 generates ID friendly one, so it must export those and not do it at runtime, which you know those pet a you can see right here like from client models import peps who generates the Python files and then you consume them instead of at runtime so that you get like autocomplete and type checking and all that plus cool. Yeah, Teddy out here on the livestream says, What use case would you use fast API versus say Django rest framework? It's kind of hard to compare the two. I mean, if you're already using Django, then yeah, obviously go Django rest framework. But yeah, in terms of Django plus Django rest framework, versus Fast API. As always, it depends. It just depends on what you're doing. It depends on the size of API depends on if you need authentication depends. If you want, like, the cred admin, you know, type stuff happens on what tools you're going to be using to consume the API. You know, if you want something that's tried and true and battle tested, I would go with Django and Django rest framework. If you want to play around with the new hotness and take advantage of you take it away, and all the cool stuff that's coming out, then maybe check out fast API. Yeah, I would certainly say if you already have Django, and you're already using it, just want to plug into the same app and just keep rolling, probably just Django rest framework, right? unless you really want to commit to having multiple apps that you run separately, I don't know put behind nginx through URL routing or something like that. Also, you know, how much of the fast API features you're going to use? Right? If you don't care about async, and await and you don't care about the typing very much. And you'll care about the documentation like well, then it kind of comes down. But if like, all of those things are super important to you, maybe breaking that out matters. I don't know. That's kind of my thought as well. Yeah. Also, Robin Haase asks best jazz friend and framework for fast API. I mean, it's kind of open. Right. But yeah, there's anything that stands out is better. Now. I mean, it works well with Vue react Angular Vue j.s, because, yeah, because it's Vue. js. + +41:51 Cool. All right. Another one here, I want to talk about that. super neat, is fast API profiler. This looks really cool. So it's middleware. I mean, in your the Django expert, certainly for the two of us here. Tell me how it works. I know in pyramid, there's a debug toolbar you can turn on. And then one of the parts of that is show me the profile. Like when I request this page, where was my time spent, show me my SQL alchemy queries actually interlaced in there, and so on, which is super cool. Django or something like that, right? Yeah. Django has Django debug toolbar. Yeah, you had said that it does. Basically, that works with your server side templating to like, figure out like, with that specific template, or with that specific route, how many different queries did it take to load this view, right? Find if you got the n plus one problem, because you pass this thing? And then you're all over? Yeah, touching the lazy loaded properties over and over and over something like that, right? It's not going to detect n plus one issues for you. But you should be able to see them. Yeah. If you're like, why do I have 100 queries on this page instead of two? Like, if that just jumped out at you? Right? Yeah. Yeah. So this is I'm guessing like that. Yeah, I wonder where the output prints out at. Because like with SAP, I usually don't have the server side template with it. So Exactly. By default, you don't just like just print out to the terminal, or it's worth pointing out that fast API does support Jinja. Yeah. But you got to do a little extra work to make it do it. So it says this is really like a leveraging py instrument, from Joe, Rick. And if you scroll down over on their repo on his repo, there's a bunch of cool output. I think it might even open up. I don't know, I'm not sure exactly where it goes, it might I think it saves it to like a profile file. So you get this like, cool little view of like the profile output here. Even there's like a terminal version, which is, I don't know, like a rich, colored terminal version. But there's a bunch of cool graphs and stuff you can get. There's a I think saw some here somewhere, you can open up some kind of like flame graph as well. That's cool. Yeah, you can dig into it. So basically, anything you can do with Py instrument, I think this is just a middleware wrapper that generates py instrument stuff for you. Which is, that's pretty cool. Yeah, a lot of stuff super powerful. Like, you could probably take some of that stuff and tie it into your integration type test to ensure that, hey, like, I built this route, this view takes X amount of time. And like, you know, now I want to throw a junior developer on my code, I want to make sure Hey, there's not like a performance loss. And you can like use, like some stuff like that to make sure that the number of queries don't all of a sudden, triple like the load time doesn't all of a sudden triple stuff like that. So tying like stuff into your test suite is like super powerful. I do that with there's a library called n plus one that does that very same thing. Oh, interesting. Yeah, very cool. Nice. Heard of you. I haven't used it. I another one. I think the stands out is pretty interesting is fast API mail. I mean, boring. But Sydney mail is one of those things that can take forever. Like, I had one of these admin parts of my site. I had a way to like send an email to all the people in this class, go 20 seconds. 30 seconds timed out. And it was going to like 1000s of people and the problem was, it had sent hundreds of emails and then it timed out. So how do I resend that without sending a duplicate to the first half right? + +45:00 Maintaining the state there. Yeah, yeah. And like I don't really like it's really hard to go back to figure out everyone who got it but not Anyway, it was a huge pain. And I'm like, okay, sending a lot of emails kind of sucks. And so this one fast API mail is cool. Because email is one of those things. That's like, dreadfully slow. And this allows you to asynchronously send email messages in a super simple way. So you're not using like sendgrid or FCS or anything to at the time I was using STS at the time. Yeah, okay. Okay. But I was doing it one instead of like doing a box. And yeah, I said, This customize email to that person, this customized email that person. And like, I mean, it's fine. If I even just set the timeout, limit longer, or whatever, it would have been fine. what I ended up doing is putting that in a background queue, and just let it go. Like, that's how it should have been all it shouldn't have been as part of the request. But I learned the hard way. The first time around. I use celery for that same exact thing. I mean, that just goes right in the queue. Yeah, perfect. Yeah. So this one has an async. This makes immediate, like you said, doing bulk sending on some kind of background thing is exactly what you should do. But if you're trying to not block your processing, and you want to send an email, just one like, hey, reset my password, right? That could be a thing you could just do right away. And so here's a way to await sending emails, which is kind of cool. Yeah. If you don't really care too much about what happens to email, like, if you're not too concerned about, hey, if this doesn't get delivered, it's not a huge deal. I'd say like something like, this is fine. If it's more like, you know, there's like any sort of like workflow based on what could happen there like success, failure, that sort of thing. Like if then this that type stuff. Yeah, you want to probably look more like to charge salary, I would say, but I still think like, if you want to send like just a simple one off email, maybe it's even to yourself or something like that. I do this, like with fast API mails, I send myself notifications based on events. And like, I don't necessarily need them. Like, you know, I'm not curing cancer here. So if I don't get it, I don't care. Yeah, I've the same thing. I'll just shoot myself messages like, oh, here's the thing you should probably know about that happened. Very cool. Yeah. I have moved to sendgrid. Last month or two and really like it. That's so big. Yeah. Oh, come back to normal size. So let's see. What else have we got here? That's interesting. Then we got the obligatory utils. The catch? All right. Everything. Yeah. Well named 'helpers.Py' Exactly. + +47:13 So we got socket IO stuff, plugins, pagination, one I do a to that I'll call quick attention to that, I think is our Nate in here. Not this is not an endorsement or saying I would use it, but I think I can see a real interesting use case. First, the cache one. Yeah. Maybe tell us about this cache one here. Yeah. So I haven't used this one. Exactly. But it looks like yeah, it just is probably going to probably cache at the route handler level. And it looks like it's caching the response based on the request parameters inside of Redis. Yeah. You know, what's the one thing that's interesting here that it actually does that? I haven't seen any of these examples yet? But it's certainly a fast API thing is it uses dependency injection for the cache. Yeah. Right. So they've got a vue method here, or API endpoint, I guess, called Hello. And then one of the parameters is cache of type Redis. cache back in equals the default values depends on Redis Cache, though it goes and finds the one instance of that and hands it off. And then it's just like, standard Redis key value type thing, like await gets a thing, that thing's not there a weight set the thing? No, pretty standard, but it's kind of cool. Just the simple integration. Yeah, like dependency injection, I think is a it's a pretty difficult concept. If you look at it from a theoretical standpoint, but it's just like, this object here. Is this taking this other object at runtime, you know, essentially, so it's, yeah, yeah, it's just a way to like sort of split apart sort of your dependencies, and it makes testing a whole lot easier. So you don't have to have like mocks all over the place. So you can just switch out like the Redis Cache for like in memory cache or something like that. Yeah. If you could pass in anything that has a get in a set. You're good, right? Yeah. + +48:50 I think because of dependency injection, fast API, like you don't necessarily have to leverage it. But I think that if you are it does take a little like, you have to be a little bit more of a seasoned developer. I'd say that. Yeah. Some of that's just picking up flask. Yeah, I totally agree about that. And dependency injection is one of those things like, Oh, this is really cool. And it makes things simpler. And why is it so complicated? What just happened? You know, I don't think anymore. So I don't know, when you start looking at flask global objects, like Yeah, yes, yeah, that's complex. That's true. I'm thinking more of like the static languages like Java or C#, where you've got an art and art and interface one of these and then a bunch of those registered then like, which one did it actually like? What concrete type? Am I even working with? Like it can you know, people can over pattern it, I guess, alright, speaking of not over pattern, and let's check this out. So I was talking at the Python at the San Francisco Python meetup last night, and someone asked me, Hey, is there a way and fast API to just like, no, this was my office hours. Same Day, though. And also video meeting anyway. Somebody said, Can you take a database and make it like an API? Yes, this fast API-crud router. It sounds amazing, right? So basically, what you do is + +50:00 You tell it your models and the schemas. And then it just creates all these API endpoints to do restful things to that table through the model. So I have a 'get/Potato', potato was the model, get/potato, that'll list them all, post a potato to slash potato, create one, delete potato, get rid of all of them, delete potatoes/ID, or delete that one, and so on. So it just turns this into basically, a series of fast API endpoints that are all the restful behaviors against your schemas. What do you think you'll see it delete all very often? No, little Bobby. Wonder if this is Gino might be leveraging something similar to this? Yeah, to me, this feels overexposed. But if it's like something simple, something internal, probably, maybe it makes sense to just go like, I just need to do that. But from JavaScript, we just turn that on. But like I said, I probably would not turn that on, because it scares me. Let's see rate limiting. There's a couple of rate limiting things that are cool here, slow API. And then what was the other fast API limiter, those are cool. And that's pretty much it for the standard. Doc's right, or standard extensions. And we've got the documentation, we've got some external resources, like some of the things that you have over at test driven podcasts you mentioned, or somehow someone linked the Python bytes thing I had with Sebastian, I believe, I'm starting to get to the point with this awesome list where it's like, I'm gonna probably have to like, curate the tutorials a little bit more, because there's so many, like new tutorials, like when I started this, there were like three or four tutorials. And now you know, there's like, hundreds of them just based on the popularity of the framework. So getting a little bit smarter with curating the tutorials like because you don't need 15 different tutorials, like showing how to build a crud app, you know, that sort of thing. I did find this fast API for flask users super helpful. Because what it does, it says you want to do this in flask. Here's the fast API version, you want to do this in flask. Here's the fast API version. And it just does like simple stuff. Like I want to write some examples. I want to create a method that does post here's what it looks like, there. Here's what it looks like and fast API and just side by side, sort of cookbook type of thing. I like that a lot. So yeah, that's a good one. All right. We're just a few moments left. I don't think we have time to go through all the tutorials. And honestly, I haven't researched them enough to talk about it. A couple of talks, which is cool. A couple of videos, which will I'm sure there's probably some more out there on the internet that could overwhelm your list at this point. Courses unite. We've got a lock on it. Yeah, I recognize a few of those names, right. Tell us about the courses. Exactly. Yeah, there's, there's three courses. And now let's talk about yours first. Yeah. So my course focuses on building just a RESTful API, but also focuses on heavily, obviously, on test driven development. But also focus on code formatting, you know, type tools like black and flake eight, and you know, stuff like that. And so I also go into CI/CD with GitHub actions, and then everything is dockerized. And then you also deploy it to Heroku. Yeah, super cool. And I saw that both you and I are also sponsors of the fast API project, and GitHub, which is pretty cool. Get your gold. I'm silver, I assume you donate a few more bucks a month, then? I guess. So I guess so. But yeah, how does that work? Here's the thing, I don't call out to like, say, oh, look how cool we are. I mean, I do see on the screen here, which sort of brought it to mind. But there's so many companies out there who are building major parts of their businesses on top of fast API and other projects. Like GitHub has such a simple way to go press this monster button, support those things and make sure they have a nice, vibrant ecosystem. People backing them, you know if we can do it, surely. Yeah. Bank of America Chase. Yeah, Microsoft, Google. Get in there. Anyway, that was more my pitch there. Yeah, I think more companies need to like sort of call that just the cost of doing business. It's like we're resting on the shoulders of all these giants. Like we have this other remote team over here that we know nothing about. And yet we're leveraging You know, this team. And not all projects are like curated or maintained by Google and Facebook, either. And so yeah, I think it's, I'm not trying to like push this in people's face, like, Hey, I'm better than you. I'm just trying to say, hey, if I can do this, then yeah, I think a lot of companies should be thinking more and more about doing stuff like this. I totally agree. I think they should definitely see, we critically depend upon these technologies. Let's make sure that we do a little bit to support them. Yeah. Because a little bit for a lot of these companies would dramatically change it right. If, if last god $2 million a year, that would fundamentally change that organization. And yet, given how many people use it, it would be nothing to the companies and like, it wouldn't even show up almost right. So I know there's a lot of complexity about how companies justify money and where it goes on the balance sheet and all that, but still, people encourage your company to do stuff like that. Alright, so this is a really cool course. I love the Docker angle of it as well. So nice. So test driven, how's that work? What's the story without test driven development? Like we're like the site? No, I mean, just like like, how's that plugin with like, fast API like are you doing like this dependency injection type stuff or + +55:00 I show I show like I show both like sort of ways how to do tests like I show like how to mock everything. So if you want to like speed, like your tests up, but I don't actually go into like all the dependency injection type stuff with though ORM gets a little bit too complex, I would say, especially with how Tortoise is set up. But yeah, I showed basically two different types of tests, like you can test it sort of like integration style, where it's actually hitting the database, or like you're using monkey patch to sort of mock out sort of the actual the database layer. Yeah, that sounds typical and useful. It's also like what differs, like between our two courses is like mine, or text base, and yours are all video based. Yeah. All video based? Yes. Yeah. I mean, there's source code on GitHub that sex Not really, yours, or both, then Damn, oh, no, no, no, not really. It's all video based. So I didn't plan this ahead. I looked at like, wow, you have my two courses, which is super cool. So I have two courses on here. Modern API's with fast API, an idea is just like, let's build an API that uses by pydanttic that uses async, and await that has real live data, stuff like that, and sort of the fundamentals of the fast API world. And then the other one is for web apps with fast API. So if you had already a flask app or something like that, and you're like, I'd like to add some additional features to it, you could just plug in some API type thing to it, some RESTful API extension, and it goes, but if you're starting from scratch, you might create an API with fast API, then you're like, well, I also wanted some HTML stuff. Could I actually add like a server side HTML story around this, as well. So it shows you how to like create users submit forms, validate data, use templates, all that kind of stuff. So kind of like, you need a little bit more on the server side. Little more on the web side, not just pure API, then like that will tell you how to do that. So you're using server side templating for that, yeah, either. Jinja2 or Chameleon? Doc, I think you pick. Yeah, basically, we recreate 'Pypi.Org' on fast API. So you're not doing a single page app, and with Vue and react and Docker rising a bowl, you'll know exactly why not? I know, I know. I'm not doing it right. I'm totally doing it wrong. But I definitely would not be cool in the JavaScript world. You know, it's not that I'm super against that. But I think that there's still a lot of value to having some of the stuff on server side. And then I don't think everything has to be a spa right? single page app. Yeah, I was being sarcastic. He was that's definitely like, I know, your I know, your Yeah. + +57:24 So we were talking right before this, we realize we have our three classes here of courses here that we wanted to do some kind of special for people. And a way to also to get back to fast API on tell people what we came up with, like this was not pre planned until we're like, oh, wait, why don't we you know, we're looking at these three things here, let's, let's do something. So we decided that we're going to sell these three classes as a bundle, not one off. But if you want to take all three of these classes, we'll give you 50% off the price and 50% of that 50% 50% of what you pay us, we'll go to the support fast API directly. So we're going to try to do a little fundraiser for fast API and do a little awareness for our courses. Yeah, the courses kind of build into each other too. But your core, your modern API is a facet, the I would recommend taking that first. And then sort of my course takes that to the next level. And then you want to learn how to like, hey, one of my routes to have like server side templating. So I can like serve up something to interact with API, then add the full web app? Of course on top of that. Yeah, I think so. Go, we don't have because it's your course and my courses, different platforms, all that we don't have an official way to make this happen. So they just send you an email. Yeah. and say, Hey, I like to do this bundle thing. And what's make it happen behind the scenes, right? Yeah, just shoot me a quick email, Michael @ testtrim.io and I'll, I'll do all the hard work and, you know, link all that together and send in put, they'll probably just use stripe for that. Just send it send a quick stripe invoice. And then we'll make work behind the scenes. And then Sebastian, the fast API creator, that 2% of that. So help support fast API. Yeah. Fantastic. Super cool. Yeah. Thanks for doing that together. Michael. That'd be fun. All right. There's a couple of questions in the live chat. Before we wrap this up, I guess. Let's go to this question from Joe real quick. Because I'm really happy with flask. Is there any reason for me to switch to fast API, we've covered some of the reasons I would point out before anyone out there listening to makes that decision, if Flask 2.0 is coming out in a week. Wait until that happens. And then look at what the modern flask looks like. A lot of major stuffs coming like async and await support and so on. And then compare those. Yeah, I would say that's a hard No, don't switch shred away, like, spin up an app like see if you like it, see if you enjoy it before like moving your application over to Fast API. Yeah. And then Teddy has another interesting question. Like, are we aware of any CMS like projects building on top of fast API similar to say wagtail with Django? I don't know of any. Yeah, I don't know, either. The closest that I would say is like that thing I described with full web apps with fast API, which you can get the code from the open public GitHub repo, you have to take the course to check it out. That does put like HTML views and stuff on top of it, but you're still there. It's more like what flask does from scratch. Not what Wow. + +01:00:00 It all dies from like, here's your CMS. That's a very long way from what you're asking. But it's as close as I know. Yeah. wagtails like it between WordPress and sort of Django. So it adds a lot on top of Django. So yeah, if you're looking for that sort of functionality, fast API is not going to do that. I want to even look for an extension out there for that. It's not the right tool. Yeah, probably not doggedness. What's the best platform to deploy fast API? Yeah. So you can deploy it really sort of anywhere? It kind of gets into the hosting section there. So if you want to scroll down there, yeah, it's kind of we're like on the cusp of it. But we don't really have time to go too deep into it. Yeah. Yeah. I mean, if you've containerized, you can obviously deploy it wherever I deploy like to Heroku. It's very simple to deploy containerized apps, they're easy to digitalocean as an app platform, it's a platform as a service. It's similar to Heroku. Now, so yeah, I mean, really, wherever you like to do your deployments to fast API, totally find it, deploy it there, it goes pretty easily. Yeah, there's even some serverless stuff that you point out in your list, they write further down. Certainly on the infrastructures of service, nginx gunicorn, particularly running UVicorn workers. So you get the async support, which is like a special flag, you can pass to G unicorn. And then Let's Encrypt for SSL. And you called Yeah, and I run it awesome flask repo as well. And I literally copied and pasted this from flask. So it's like, I mean, all of this is just agnostic. Yeah. Yeah. Very cool. And Joe actually give a call out to that earlier saying, I just started your live and realize you're the same guy that does awesome flask lists. Awesome. Yeah, there we go. Fantastic. All right. Well, I think we're quite out of time here, Michael. But super fun, super fun stuff to talk about really helpful. I just love these awesome lists. Because not only do you learn about all these cool things like some of the plugins like fast API, SQL alchemy, but then also things that those are using. It's just such a cool exploration of all these different libraries and things that are out there, right, like beanie. For example, I didn't know about beanie, but now I discovered it even though I wasn't looking forward to that, in particular. Yeah, I know how many times I see on Reddit, like people asking, Well, hey, what should I use for off? What should I use for this? And it's like, well, I mean, yeah, I mean, here's a list. Here's a concise list of all the different things out there, it might not cover everything, because there's obviously other things out there. But it's definitely nice to like, come to something like this versus like, you know, searching PyPi with Fast Api. + +01:02:21 Exactly, exactly. One of the big challenges we have in the Python space, which is kind of the opposite of some places like say, the Microsoft world where they're like, here's your ORM. Here's your web framework you go build with that have a good time is we have the exact opposite of there's 1000 flowers blooming at each level of the stack. And it's a Paradox of Choice. Did you notice as a newcomer, I don't think it feels like amazing. Look at all these choices. It feels like overwhelming I have What do I do? Yeah, right. I think awesome lists, like the one you created, are helpful to really narrow it down to a couple, go make a pick and just run with it. Blow the Ruby and Rails folks, when they come over to the Python space. They're like, well, there's more than one. There's more than one. + +01:03:02 Well, which one do I use? Oh, hello, SDH. Ah, I guess? Yeah, exactly. What are you doing? Are you doing? Yeah, very cool. All right. Well, I guess we're down to the final question. Before I let you out of here. I'll see if this has changed since last time. So if you're going to write some code, what editor Do you use Python code, I suspect. So I approached this question a little different, because you asked me favorite editor. And I said idle, because I don't use idle, you know, obviously, but it has a special place in my heart because that's where I learned Python was idle. And Thani is like sort of like a between like VS code, or a PyCharm and Idle, sort of like interesting. You're going to learn like Idle like today, check out fani like, adds like some debugging, like type tools. And it's like, a lot prettier to look at past super interesting. It's like, sort of like a notebook, sort of like a proper, like, autocomplete editor. And it looks a little bit like I don't Yeah, How interesting. Yeah, boom. Okay. Yeah. I haven't thought about that one for a while. That's cool. And then don't Yeah, don't use this on a daily basis, though. So + +01:04:04 I definitely use VS code on a daily basis. So I'm not coding in the idle. Like what is idle stands for integrated development and learning environment yet? Don't use that on a daily basis. Exactly. Are you scratch my architecture design diagrams? Hell, there you go. Yeah. And then notable py pi package. So I've been using for docstrings. I've been using flake eight doc strings just to pull up my doc strings. And that has helped me, you know, I use like the sort of the Google flavor of doc strings. And that has helped me adhere to that a little bit better. Yeah. I also want to give a shout out to hotwire Django, so hotwire comes from sort of the rails sort of world, but this is essentially HTML served over WebSockets. So instead of like, you know, doing, you know all this stuff with vue or react, what you do is you serve up basically templates and the templates are pre renderedio.com. That's where hardware comes from. But instead of serving up your JSON having two forms of state one on the client + +01:05:00 One on the server you just like you simplify it all made. It does add complexity with WebSockets. You have to deal with that. But yeah, I like sort of that paradigm. And it feels like a little bit more like 2005 ish than 2020 ish. But yeah, I mean, it's definitely been working for the rails folks. And Basecamp. Yeah, that place is interesting. All the frameworks that they kick out. Yeah. Cool. All right, well, so if you want to, if you want to be like, hey, and DHH can go do that. That's really cool. I've heard good things about it. Awesome. Well, Michael, it's been really fun to talk about your awesome list. And I think all the stuff we touched on is going to be super helpful, final call to action, people where to get started with fast API and pick some libraries where they do. You know, obviously, I would definitely say start with the fast API documentation. It is great out of all of like the documentation out there in terms of flask and Django and whatnot, that one is by far the best, definitely start off there. After you've built like just a basic app, check out the awesome like fast API list to see how to extend it. And then, if you're looking to like, learn more on the testing and Docker type realm, check out my course and check out Michael's courses as well. Yeah, and you can do that bundle thing. I'll include the email so they can send it over and help them out. Very cool. Awesome. Yeah. Yeah. Awesome. Yeah. So thank you so much for being here. Really appreciate it. And it's been super fun to talk fast API and all the stuff around it. It's really growing, isn't it? Yeah. It's exciting. It's exciting to see just how much it's grown just in like, past like six months. Yeah, absolutely. Alright. See you later. Yeah. Well, what's yours? Thanks for having me. Appreciate it. Yep. There's, this has been another episode of talk Python. To me. Our guests in this episode was Michael Herman. And it's been brought to you by us over at talk Python training. to level up your Python. We have one of the largest catalogs of Python video courses over at talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription insight. Check it out for yourself at 'training.talk python.fm'. Be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top. You can also find the iTunes feed at /iTunes, the Google Play feed at /play and the direct RSS feed at /RSS on talk python.fm. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at 'talkpython.fm/YouTube'. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code