From 1330c2684a0633a4b55f2f3f84539a8660d95f73 Mon Sep 17 00:00:00 2001 From: Manikantagit Date: Mon, 27 Sep 2021 16:59:57 +0530 Subject: [PATCH] 335-gene-editing.txt--updated Updated --- transcripts/335-gene-editing.txt | 170 +++++++++++++++---------------- 1 file changed, 85 insertions(+), 85 deletions(-) diff --git a/transcripts/335-gene-editing.txt b/transcripts/335-gene-editing.txt index 9f4f3815..ed0458db 100644 --- a/transcripts/335-gene-editing.txt +++ b/transcripts/335-gene-editing.txt @@ -1,16 +1,16 @@ 00:00:00 Gene therapy holds the promise to permanently cure diseases that have been considered lifelong challenges. -00:00:05 But the complexity of rewriting DNA is truly huge and lives in its own special kind of big data world. On this episode, you'll meet David Born, a computational biologist who uses Python to help automate genetics research and helps move that work to production. This is Talk Python, my episode 335, recorded September 15, 2021. +00:00:05 But the complexity of rewriting DNA is truly huge and lives in its own special kind of big data world. On this episode, you'll meet David Born, a computational biologist who uses Python to help automate genetics research and helps move that work to production. This is Talk Python to Me, episode 335, recorded September 15, 2021. -00:00:39 Welcome to Talk Python. +00:00:39 Welcome to Talk Python to Me. 00:00:41 A weekly podcast on Python. -00:00:43 This is your host, Michael Community. +00:00:43 This is your host, Michael Kennedy. -00:00:45 Follow me on Twitter, where I'm at M. Kennedy and keep up with a show and listen to past episodes at Talk Python, M and follow the show on Twitter via at Talk Python. +00:00:45 Follow me on Twitter, where I'm @mkennedy and keep up with a show and listen to past episodes at 'talkpython.fm' and follow the show on Twitter via '@talkpython'. -00:00:54 We've started streaming most of our episodes live on YouTube, subscribe to our YouTube channel over at Talk Python FM YouTube to get notified about upcoming shows and be part of that episode. This episode is brought to you by Shortcut, formally known as Club House IO and us over at Talk Python training and the transcripts are brought to you by assembly AI David. +00:00:54 We've started streaming most of our episodes live on YouTube, subscribe to our YouTube channel over at 'talkpython.fm/youtube' to get notified about upcoming shows and be part of that episode. This episode is brought to you by Shortcut, formally known as 'ClubHouse.IO' and Us over at Talk Python training and the transcripts are brought to you by 'Assembly AI', David. 00:01:16 Welcome to Talk Python to me. @@ -30,9 +30,9 @@ 00:02:05 Yeah. -00:02:05 So I always thought that programming would be cool, but I didn't really have much of an opportunity through my undergraduate settings do much formal programming. I took one computer science plants that my College had to offer was in C Plus. Think I wrote a boggle program, something with some recursion in there. It was pretty fun. I didn't really get these iphone until graduate school was a diegetic course, and we were basically tasked with doing some data analysis on published data and then reproducing some pots at a figure, then extending it further. My partner and I decided to learn Python teach it to ourselves so that we could do this. We heard that it was it was a good way to do data analysis in biology. And so we basically taught it to ourselves that we use dump, some basic string searching and things to redo this analysis. And it was really amazing what we could do with Python for that civil project. +00:02:05 So I always thought that programming would be cool, but I didn't really have much of an opportunity through my undergraduate studies to really do much formal programming. I took one computer science class that my College had to offer was in C++. Think I wrote a boggle program, something with some recursion in there. It was pretty fun. I didn't really get to use Python until graduate school. I was in a genetics course, and we were basically tasked with doing some data analysis on published data and then reproducing some plots in a figure, then extending it further. My partner and I decided to learn Python teach it to ourselves so that we could do this. We heard that it was it was a good way to do data analysis in biology. And so we basically taught it to ourselves that we use NumPy, and some basic string searching and things to redo this analysis. And it was really amazing what we could do with Python for that's. -00:03:02 That's awesome. Had the learning project go, like coming from not having a ton of programming. What was your experience like? +00:03:02 That's awesome. How'd the learning project go, like coming from not having a ton of programming. What was your experience like? 00:03:08 It was relatively easy, I would say. I think my brain sort of fits pretty well with how programming languages work, but it was definitely a lot in a short amount of time to really dive into how to make sure your while loops don't stay open. And then someone tells you maybe you shouldn't use a while loop at all, as I was learning a lot of things not to do right away. @@ -44,15 +44,15 @@ 00:03:53 Maybe many people listen to the podcast already kind of know, but I think looking in from the outside, it feels like, oh, I've got to go get a degree in this to be productive or useful. And really what you need is like a couple of weeks and a small problem and you're already already there. -00:04:10 Absolutely. Yeah. I've definitely found that just learning through through doing has been the way I've worked entirely. I have essentially no formal programming treating no course work and music. I thought every day that's fantastic. +00:04:10 Absolutely. Yeah. I've definitely found that just learning through through doing has been the way I've worked entirely. I have essentially no formal programming training no coursework and I'm using python every day that's fantastic. -00:04:24 Yeah, I didn't take that much computer science oncology that are just enough to do the extra stuff from my math degree. Very cool. Alright. Now how about today? Are you working at Beam Therapeutics doing genetic stuff? Tell us about what you do day to day. +00:04:24 Yeah, I didn't take that much computer science on college that are just enough to do the extra stuff from my math degree. Very cool. Alright. Now how about today? Are you working at Beam Therapeutics doing genetic stuff? Tell us about what you do day to day. -00:04:37 Yeah, I'm on the computational side to see a Beam Therapeutics. We're at Gene editing Company. So we develop these precision genetic medicines that are trying to develop them to cure genetic diseases that are caused by single genetic changes in the mid point Chef mutation or something like that. Yes. +00:04:37 Yeah, I'm on the computational Sciences team at Beam Therapeutics. We're at Gene editing Company. So we develop these precision genetic medicines that are trying to develop them to cure genetic diseases that are caused by single genetic changes in genome so we've got an example mutation or something like that. Yes. 00:05:00 So if you have one of these genetic changes, you might have a disease that is lifelong and there aren't any cures for most of these diseases. -00:05:09 So we're trying to create these. We call them hopefully lifelong heros for patients by changing the genetic code back to what it should be. +00:05:09 So we're trying to create these. We call them hopefully lifelong cures for patients by changing the genetic code back to what it should be. 00:05:18 That's incredible. It seems really out of the future. I mean, I think it's one thing to understand genetics at play, and it's even amazing to be able to read the gene sequences, but it's entirely another thing, I think to say and let's rewrite that. @@ -60,7 +60,7 @@ 00:05:54 I bet a lot of people go to work and they end up writing what you might classify as forms over data. It's like, well, I need a view into this bit of our database, or I need to be able to run a query to just see who's got the most sales this week or something like that. -00:06:10 That's important work. And it's useful. And there's cool design patterns and whatnot you can focus on. But also, it's not like what people dream of necessarily building when they wake up. But this kind of science, like, maybe. So these are the really interesting problems that both have a positive outcome. Right. You're helping cure disease, not just shave another of a off of a transaction that you get a key or something like that, right. From, like, in finance, you get to use really cool tech to do it, too. Like, programming wise. +00:06:10 That's important work. And it's useful. And there's cool design patterns and whatnot you can focus on. But also, it's not like what people dream of necessarily building when they wake up. But this kind of science, like, maybe. So these are the really interesting problems that both have a positive outcome. Right. You're helping cure disease, not just shave another 1/100% off of a transaction that you get a key or something like that, right. From, like, in finance, you get to use really cool tech to do it, too. Like, programming wise. 00:06:42 Yeah. @@ -68,17 +68,17 @@ 00:07:09 Yeah. Absolutely. So you mentioned CRISPR, maybe tell people a bit about that. Biotechnology. -00:07:15 Crispr is molecular machine, which targets a very specific place in a specific genetic sequence. And so usually people are using CRISPR to target a specific place in the genome, a specific sequence. And what CRISPR does naturally, is to cut at that sequence. So it'll cut in a very specific place in the genome. +00:07:15 CRISPR is molecular machine, which targets a very specific place in a specific genetic sequence. And so usually people are using CRISPR to target a specific place in the genome, a specific sequence. And what CRISPR does naturally, is to cut at that sequence. So it'll cut in a very specific place in the genome. 00:07:42 And as people were using CRISPR, we could actually decide where it's going to cut by giving it a different targeting sequence. 00:07:50 This sort of directed molecular machine is a basis of a whole new field of biotechnology. Using CRISPR and CRISPR derived technology. -00:08:00 Our technology is like, kind of a CRISPR two0 where we don't use CRISPR to cut. We use the localization machinery and we add on to it another protein which just changes the base instead of hiding the DNA itself. +00:08:00 Our technology is like, kind of a CRISPR 2.0 where we don't use CRISPR to cut. We use the localization machinery and we add on to it another protein which just changes the base instead of cutting the DNA itself. -00:08:16 It's slight variation, but it's still using the same Crisp protect knowledge. +00:08:16 It's slight variation, but it's still using the same CRISPR technology. -00:08:20 Okay. So let me see if my not very knowledge filled background understand here. It kind of has a decent analogy. So does it work basically, like, you give it almost like, find and replace. You give it a sequence of and it says, okay, if I find at cat, like, enough specificity that it's like, that's the unique one. And then it does, like, cut at that point. Is that kind of how it works? +00:08:20 Okay. So let me see if my not very knowledge filled background understand here. It kind of has a decent analogy. So does it work basically, like, you give it almost like, find and replace. You give it a sequence of and it says, okay, if I find attcat, like, enough specificity that it's like, that's the unique one. And then it does, like, cut at that point. Is that kind of how it works? 00:08:47 Yeah. It's pretty much just like that. We you give it the sequence you want to target and then if it finds that sequence in the genome, it will cut the genetic material, the DNA at that position for normal CRISPR. @@ -86,9 +86,9 @@ 00:09:14 Right. It's definitely a tricky question. I think we are. -00:09:18 We definitely leverage human biology, how the human body works for a lot of these problems. For example, some of our leading drug candidates are for sickle cell disease. And because of the way sickle cell disease manifests in red blood cells and red blood cells are created through a tough there's a specific type of cell that creates red blood cells. And we can. If you access that type of cell and you cure sickle cell in the progenitor cells, the stem cells, then you can create all red blood cells from a cured population. So if you can target it to the gender cells, you can cure sickle cell throughout the body, essentially because the symptoms are from red blood cells. There's a lot of diseases. That which I carry a single word in. +00:09:18 We definitely leverage human biology, how the human body works for a lot of these problems. For example, some of our leading drug candidates are for sickle cell disease. And because of the way sickle cell disease manifests in red blood cells and red blood cells are created through a set of there's a specific type of cell that creates red blood cells. And we can. If you access that type of cell and you cure sickle cell in the progenitor cells, the stem cells, then you can create all red blood cells from a cured population. So if you can target it to the progenitor cells, you can cure sickle cell throughout the body, essentially because the symptoms are from red blood cells. There's a lot of diseases. That which by curing a single organ. -00:10:08 You can cure the symptoms of the disease because that's where it actually metaphors like diabetes or sickle cell anemia or something like that. A single cell diseases in the liver, some blindness in the eye. +00:10:08 You can cure the symptoms of the disease because that's where it actually metaphors like diabetes or sickle cell anemia or something like that. A sickle cell diseases in the liver, some blindness in the eye. 00:10:22 By just targeting specifically where the symptoms occur, you can cure the disease. @@ -100,15 +100,15 @@ 00:11:10 Those progenitor cells, they eventually have to recreate new ones. And then the way they do that is they clone their their current copy of the DNA, which is the fixed one. Right. So if you can get enough of them going, it'll just sort of propagate from there. -00:11:23 And we're also the benefit that sometimes you only need to cure a small fraction to remove the sip on lots of things going for us. +00:11:23 And we're also the benefit that sometimes you only need to cure a small fraction to remove the symptoms so yeah lots of things going for us. 00:11:31 That's awesome. 00:11:33 So I'm sure there's a lot of people involved in this type of work. What exactly are you and your team working on. -00:11:39 Yeah. So our team, we call our public computational science team. We really sit in the in the middle of the research and development arm of the organization processing all of our sequencing data and some other data as well. +00:11:39 Yeah. So our team, we call ourselves computational science team. We really sit in the in the middle of the research and development arm of the organization processing all of our sequencing data and some other data as well. -00:11:54 And as you can imagine, with our technology changing DNA changing genomes, there's a lot of sequencing data, because what we're trying to do is change a genetic sequence. So we have to read out that genetic sequence and then figure out has it changed how many copies are changed and things like that. The field of the techniques of next generation sequencing NGS are pretty broad. And we deal with a lot of different types of these next generation sequencing assays that are being done art, really processes, analyzes and collaborates with the experimental scientists on performing and developing these experiments. Cool. +00:11:54 And as you can imagine, with our technology changing DNA changing genomes, there's a lot of sequencing data, because what we're trying to do is change a genetic sequence. So we have to read out that genetic sequence and then figure out has it changed how many copies are changed and things like that. The field of the techniques of next generation sequencing NGS are pretty broad. And we deal with a lot of different types of these next generation sequencing assays that are being done our team really processes, analyzes and collaborates with the experimental scientists on performing and developing these experiments. Cool. 00:12:37 So the scientists will do some work and they'll attempt to use CRISPR like technology to make changes, and then they measure the changes they've made. And you all sort of take that data and compare it and work with it. @@ -120,15 +120,15 @@ 00:13:38 Right. You're like this one gene is the problem. Right. And let's look at that, right? -00:13:43 Yeah. We're trying to target here. So that's where we're looking. It definitely depends on the athlete. But but I guess in terms of data scale, in terms of file sizes, perhaps that would be accessible for standard things. It would be on the order of a few gigabytes per experimental run for some of our larger assays. +00:13:43 Yeah. We're trying to target here. So that's where we're looking. It definitely depends on the assay. But but I guess in terms of data scale, in terms of file sizes, perhaps that would be accessible for standard things. It would be on the order of a few gigabytes per experimental run for some of our larger assays. -00:14:01 It's ten to under times that per experiment. +00:14:01 It's ten to 100 times that per experiment. 00:14:05 That's a lot of data, but not impossible to transmit sort of amounts of data or store. -00:14:11 Right. Every little piece of it is pretty manageable when you start combining them together and looking at some doubt, like your downstream results of things. The data does pretty large. But I wouldn't say we're at the scale of, like, big data analytics at Google or anything like that. +00:14:11 Right. Every little piece of it is pretty manageable when you start combining them together and looking at some down, like your downstream results of things. The data does pretty large. But I wouldn't say we're at the scale of, like, big data analytics at Google or anything like that. -00:14:26 Yeah. The LHC, if you've ever looked at the data flow layers of Lt, it's like the stuff near the collectors. It's just unimaginable amounts of data, right? +00:14:26 Yeah. The LHC, if you've ever looked at the data flow layers of LHC, it's like the stuff near the collectors. It's just unimaginable amounts of data, right? 00:14:36 Yeah. @@ -142,7 +142,7 @@ 00:16:06 That's some of it for sure. -00:16:07 Tell us about your life. Okay. +00:16:07 That's about your life. Okay. 00:16:09 Yeah. There's a couple of aspects that I think we always touch on them. But for sequencing experiments, the the pipelines are more defined because we usually get the data from a source that's already in the cloud, which I'm always happy about. If we could start in the cloud, we'll stay in the cloud. And that's a nice place to be, for instance, for data that's coming directly from instruments on premises, there is another layer of art that has to do with software on the instrument software that gets the data to the cloud and moves it around between our other database sources. And that is some fun projects itself there. @@ -150,15 +150,15 @@ 00:17:03 Absolutely. Yeah. -00:17:05 This portion of Talk Python Amy, is brought to you by shortcut, formerly known as Clubhouse IO. Happy with your project. Management tool. Most tools are either too simple for a growing engineering team to manage everything, or way too complex for anyone to want to use them without constant prodding. Shortcut is different, though, because it's worse. No, wait, no, I mean it's better. Shortcut is project management built specifically for software teams. It's fast, intuitive, flexible, powerful, and many other nice positive adjectives. Key features include team based workflows. Individual teams can use default workflows or customize them to match the way they work. Org wide goals and Roadmaps The work in these workflows is automatically tied into larger company goals. It takes one click to move from a roadmap to a team's work to individual updates and back height version control integration. +00:17:05 This portion of Talk Python to Me, is brought to you by 'Shortcut', formerly known as 'Clubhouse.IO'. Happy with your project. Management tool. Most tools are either too simple for a growing engineering team to manage everything, or way too complex for anyone to want to use them without constant prodding. Shortcut is different, though, because it's worse. No, wait, no, I mean it's better. Shortcut is project management built specifically for software teams. It's fast, intuitive, flexible, powerful, and many other nice positive adjectives. Key features include team based workflows. Individual teams can use default workflows or customize them to match the way they work. Org wide goals and Roadmaps The work in these workflows is automatically tied into larger company goals. It takes one click to move from a roadmap to a team's work to individual updates and back. Height version control integration. 00:17:56 Whether you use GitHub. -00:17:57 GitLab or Bitbucket Club House ties directly into them so you can update progress from the command line keyboard friendly interface. The rest of Shortcut is just as friendly as their power bar, allowing you to do virtually anything without touching your mouse. Throw that thing in the trash iteration planning, set weekly priorities, and let Shortcut run the schedule for you with accompanying burndown charts and other reporting. +00:17:57 GitLab or Bitbucket Club House ties directly into them so you can update progress from the command line keyboard friendly interface. The rest of Shortcut is just as friendly as their power bar, allowing you to do virtually anything without touching your mouse. Throw that thing in the trash. Iteration-planning, set weekly priorities, and let Shortcut run the schedule for you with accompanying burndown charts and other reporting. -00:18:21 Give it a try over at Talk Python FM Shortcut again, that's Talk Python FM shortcut. Choose Shortcut because you shouldn't have to project manage your project management. +00:18:21 Give it a try over at 'talkpython.fm/shortcut' again, that's 'talk Python.fm/shortcut'. Choose Shortcut because you shouldn't have to project manage your project management. -00:18:35 With robots. Do you have to actually talk to the robots like any of the type of automated things? +00:18:35 So With robots. Do you have to actually talk to the robots like any of the type of automated things? 00:18:42 Yeah, for lab automation is what we call our team that has the robots, as we like to call them. As you can imagine, with a lot of these types of experiments, they can be made much more efficient if we can have robots doing the actual transfer of liquids and incubation and centrifugation these scientific techniques that sometimes you need someone in the lab to do, but oftentimes you can automate them. So the lab robotics aspect is an important part of how we can efficiently generate data. A lot of the issues around that come with how to pass instructions to the instrument and how to get back data from the instrument, what it's done. And then there's a whole other art of making the instruments actually orchestrated together, which is held in a different world of software. I don't actually work on that part myself. @@ -168,7 +168,7 @@ 00:20:26 Yeah. -00:20:26 We end up with quite a few of those here where we have these relatively small taps of take data from an API, but it somewhere where a robot can access it. Usually we use AWS three, and these sort of very small data handling tasks end up being these nice little projects for Python to come into play. +00:20:26 We end up with quite a few of those here where we have these relatively small tasks of take data from an API, put it somewhere where a robot can access it. Usually we use AWS S3, and these sort of very small data handling tasks end up being these nice little projects for Python to come into play. 00:20:47 Awesome. I can see that definitely happens. @@ -180,11 +180,11 @@ 00:21:23 We don't have to handle anything on our own computers. -00:21:26 That's really nice in the serverless stuff. It probably helps you avoid running just tons of VMs in the cloud. Right. Like it can all be on demand. Like the Lambda trigger is a file appears in this S three buckets, so then it starts down the flow, right? +00:21:26 That's really nice in the serverless stuff. It probably helps you avoid running just tons of VMs in the cloud. Right. Like it can all be on demand. Like the Lambda trigger is a file appears in this S3 buckets, so then it starts down the flow, right? -00:21:40 Absolutely. Yeah. I really don't like maintaining a lot of infrastructure, although we do have a good amount of it that we do have to maintain. I find that these small Python functions are the perfect use case for those event driven Lambda functions, which are reading these very simple pieces of code. +00:21:40 Absolutely. Yeah. I really don't like maintaining a lot of infrastructure, although we do have a good amount of it that we do have to maintain. I find that these small Python functions are the perfect use case for those event driven Lambda functions, which are running these very simple pieces of code. -00:21:58 When an object appears in S three, they get a small event about when the optic was uploaded, and then they do their thing. A little bit of data conversion, send it to an API, and now the data is in our data store, and those things just happen, and they're super consistent. They don't require anything on my Maya to maintain it's. Pretty, pretty beautiful pattern. +00:21:58 When an object appears in S3, they get a small event about when the optic was uploaded, and then they do their thing. A little bit of data conversion, send it to an API, and now the data is in our data store, and those things just happen, and they're super consistent. They don't require anything on my end to maintain it's. Pretty, pretty beautiful pattern. 00:22:20 That's awesome. They don't need things like, oh, there's a new kernel for Linux that patches of vulnerability. So let's go and patch our functions, right. Just all magic. It all happens on its own, right? @@ -196,11 +196,11 @@ 00:22:47 I've heard of this before, but it's I've definitely not used this personally. Tell us, how does it help you? -00:22:54 Right. So when I started dabbling with these Lambda functions, what I found pretty soon after, I had four or five of these functions, which I had uploaded on the AWS Council. And if anyone use that know that it can be pretty tedious once you have a few things running, getting all of your permissions and things set up, and I was looking for better ways to do this. And at that point it was early days for this AWS develop at AWS and what it is is a way to write your infrastructure as we in essentially any common language programming language. So JavaScript type script you can do Java, Go and Python, perhaps. And using it, you can define your cloud infrastructure an object oriented way with all the parameters they need, and you can deploy it to different AWS accounts. You can take it down, you can reconfigure it and look at how it's changed since your last commit, and you can store all that configuration and source control. And this has allowed us to scale up in terms of, like, number of Lambda functions. I think we have, like, maybe 60 or something now, which would be unmaintainable in the Council, but they're essentially just completely maintained inside of Source Control these days using a CDK and Python. +00:22:54 Right. So when I started dabbling with these Lambda functions, what I found pretty soon after, I had four or five of these functions, which I had uploaded on the AWS Council. And if anyone use that know that it can be pretty tedious once you have a few things running, getting all of your permissions and things set up, and I was looking for better ways to do this. And at that point it was early days for this AWS develop at AWS and what it is is a way to write your infrastructure as code in essentially any common language programming language. So JavaScript type script you can do Java, Go and Python, perhaps. And using it, you can define your cloud infrastructure an object oriented way with all the parameters they need, and you can deploy it to different AWS accounts. You can take it down, you can reconfigure it and look at how it's changed since your last commit, and you can store all that configuration and source control. And this has allowed us to scale up in terms of, like, number of Lambda functions. I think we have, like, maybe 60 or something now, which would be unmaintainable in the Console, but they're essentially just completely maintained inside of Source Control these days using AWS CDK and Python. 00:24:20 Yeah, it seems super neat. I have not used it on the page, which I'll link to in the show notes. They have Verner Vocals, CTO of Amazon AWS and talks about some of the benefits and kind of how it all fits together. And but you said you can store your cloud structure definition in Source Control. You can run unit tests against your infrastructure to say if I apply all these commands to AWS. So I actually get what I was hoping to get out of it. And yeah, it seems like a really neat thing for this infrastructure as code bits. -00:24:53 I think it definitely really shines when you're developing larger pieces of infrastructure, but I would encourage people to check it out, even if they have a small automation type projects. This is what I was thinking of when I was listing the episode through 27 the other day. We have these things. You want to rub it on your computer with a Cron job, you can actually run them for free on AWS and you get a bunch of free time on the free tier and you try it out. You don't need to make sure your system D processor for whatever is running. And it's a pretty cool way to get familiar with how to do some of these things on AWS. I'm not sure if this also exist for other cloud providers that we use as in particular. So that's what I know, but it may also exist for things like Azure. +00:24:53 I think it definitely really shines when you're developing larger pieces of infrastructure, but I would encourage people to check it out, even if they have a small automation type projects. This is what I was thinking of when I was listing the episode 327 the other day. We have these things. You want to rub it on your computer with a Cron job, you can actually run them for free on AWS and you get a bunch of free time on the free tier and you try it out. You don't need to make sure your system D processor for whatever is running. And it's a pretty cool way to get familiar with how to do some of these things on AWS. I'm not sure if this also exist for other cloud providers that we use AWS in particular. So that's what I know, but it may also exist for things like Azure. 00:25:44 Sure. @@ -226,7 +226,7 @@ 00:27:28 It'S so enabling for you to just scale out all this workload in terms of how we create data pipelines. I think a lot of the ways we do it, we do have to be aware of creating them in a way that you can run them outside of the cloud. Sometimes we need to allow a third party to run our data analysis in a regulated way, and that requires us to have essentially running it internally. We run it in the cloud, which is efficient for scaling. -00:27:57 But we also need to be able to take that same use of software and run it in a way that may not be on AWS, so may not be in the cloud at all, and that creates some interesting software challenges. +00:27:57 But we also need to be able to take that same piece of software and run it in a way that may not be on AWS, so may not be in the cloud at all, and that creates some interesting software challenges. 00:28:08 Yeah, I'm sure, because so many of the APIs are cloud native, right. @@ -240,15 +240,15 @@ 00:29:18 So it usually begins with a a collaborative meeting with some experimental scientists where we discuss what's the experimental design going to be like. What are we going to be looking at in the data? -00:29:30 And then the experimentalists will go and they'll generate some or most sequencing data. At that point, we generally take the data, open up some Jupiter notebooks or some small I thought sometimes even just bash scripts to try to use some of those standard third party tools. These are things like making sure all the sequences are aligned to each other so that, you know, when there's differences and making sure the quality is correct, things like that. These are pretty standard bioinformatics things. +00:29:30 And then the experimentalists will go and they'll generate some or most sequencing data. At that point, we generally take the data, open up some Jupyter notebooks or some small I thought sometimes even just bash scripts to try to use some of those standard third party tools. These are things like making sure all the sequences are aligned to each other so that, you know, when there's differences and making sure the quality is correct, things like that. These are pretty standard bioinformatics things. 00:30:02 Right. -00:30:03 Then for sequencing asses, there's usually a couple spots where there's some real experimental logic going into it, where often we'll have to write custom code in Python to say, if there's this sequence here, it means we should keep the sequence. And if there's this sequence here, we should by the sequence in half or something like that. And so that code gets written in Python. Maybe it's in the Jupiter notebook or another script. And we sort of do this really slow testing the pending on the size of the data. It might be locally on a laptop or in a small lab based HPC type. Plus, this is where we're doing. +00:30:03 Then for sequencing assays, there's usually a couple spots where there's some real experimental logic going into it, where often we'll have to write custom code in Python to say, if there's this sequence here, it means we should keep the sequence. And if there's this sequence here, we should divide by the sequence in half or something like that. And so that code gets written in Python. Maybe it's in the Jupyter notebook or another script. And we sort of do this really slow testing depending on the size of the data. It might be locally on a laptop or in a small cloud based HPC type cluster so this is where we're doing. 00:30:42 You're not trying to process all the results. You just want to spot, check and see if it's coming out right before you turn it loose. Right. -00:30:49 Right. Or we're very patient. It's a little bit of buff. Sometimes it's very difficult to take only a small fraction of the data, but we try when we can. Once we settle at something that we think is pretty locked down, we'll take it out of the Jupiter notebooks. We don't try to use paper bill or anything like that. We try to get it out of there as soon as possible into some more complex script. They might be a shell script that runs a number of other scripts in order. +00:30:49 Right. Or we're very patient. It's a little bit of buff. Sometimes it's very difficult to take only a small fraction of the data, but we try when we can. Once we settle at something that we think is pretty locked down, we'll take it out of the Jupyter notebooks. We don't try to use paper bill or anything like that. We try to get it out of there as soon as possible into some more complex script. They might be a shell script that runs a number of other scripts in order. 00:31:15 Or we might start using some sort of workflow manager. The workflow managers and bioinformatics are pretty often because everyone has the same problem of writing all these third party tools together and custom code. Right. @@ -256,11 +256,11 @@ 00:31:33 Absolutely. Yeah. -00:31:34 We use LG or whatever it is. Right. +00:31:34 there's whatever it is. Right. -00:31:37 There's a whole bunch of standard bio traumatic tools that we rate on almost everything. And so some of the workflow managers are designed to specifically work very well with those tools, and others are pretty agnostic of what you're doing with them. +00:31:37 There's a whole bunch of standard bio informatic tools that we run on almost everything. And so some of the workflow managers are designed to specifically work very well with those tools, and others are pretty agnostic of what you're doing with them. -00:31:52 But one of the things I find interesting and listening to you talk about this, it just remind me is so often we see these problems that people are solving right over here. We're using CRISPR to do all this work. And then you talk about the tools you use. It's like, yeah, we're using, like, NumPy, Pandas and Jupiter and these kinds of things. And the thing that I find really interesting is for software development. There's so much of the stuff that it's just the same for everyone. Right. They're doing the same thing. And then there's ten to 20% that this field. Does this part different. But there's, like, 80% of yeah. We should use source control. We're using Python. +00:31:52 But one of the things I find interesting and listening to you talk about this, it just remind me is so often we see these problems that people are solving right over here. We're using CRISPR to do all this work. And then you talk about the tools you use. It's like, yeah, we're using, like, NumPy, Pandas and Jupyter and these kinds of things. And the thing that I find really interesting is for software development. There's so much of the stuff that it's just the same for everyone. Right. They're doing the same thing. And then there's ten to 20% that this field. Does this part different. But there's, like, 80% of yeah. We should use source control. We're using Python. 00:32:33 We're using notebooks, were using Pandas and that kind of stuff. And it's the similarities are way more common than I think they appear from the outside. @@ -280,9 +280,9 @@ 00:33:51 What is moving to production look like for you? So you talked about sometimes you start the exploration and stuff in notebooks, which is exactly what they're built for and then moving to maybe a little more composition of scripts and whatnot. And eventually, somehow you end up with Lambda cloud databases, things like that. What's that flow. -00:34:10 Yeah. So the process of us, we say productionizing a pipeline is we've had pretty well set now and generally how it works. As we say, this pipeline is about done, and we'll hand it off to myself or one of my colleagues to start the process of getting it fully cloud capable and scalable. And what that means for us is to pick the software in whatever form we've gotten it from our colleagues and put it into a workflow manager. And I think every company has their own version of workflow manager that they choose. We're using Luigi, which is fully Python based. It was originally developed that Spotify to do this sort of. +00:34:10 Yeah. So the process of us, we say productionizing a pipeline is we've had pretty well set now and generally how it works. As we say, this pipeline is about done, and we'll hand it off to myself or one of my colleagues to start the process of getting it fully cloud capable and scalable. And what that means for us is to pick the software in whatever form we've gotten it from our colleagues and put it into a workflow manager. And I think every company has their own version of workflow manager that they choose. We're using Luigi, which is fully Python based. It was originally developed that Spotify to do this sort of task. -00:34:56 It uses like a Gu make type target file Dag creation. +00:34:56 It uses like a GNU make type target file Dag creation. 00:35:03 I don't know all the technical terms to describe how the the tasks are built, but essentially you have a task at the end and you say it requires the output of this other task, and then that task requires. @@ -292,9 +292,9 @@ 00:35:33 That coordination can be really tricky. -00:35:35 Exactly. And there's a number of common workflow managers and bio Chromatics. I think the two most common are steak Bake and ex flow. +00:35:35 Exactly. And there's a number of common workflow managers and bio informatics. I think the two most common are steak Bake and ex flow. -00:35:43 Luigi has also been really good for us. We like it primarily because it is fully Python based, and it uses standard Python syntax, which allows us to really if we need to get out of the wood and add some customization extend it where we need to or fix things that we don't like about it. And that was a really important part of our decision and choosing Louisi over some of these other workflow managers. +00:35:43 Luigi has also been really good for us. We like it primarily because it is fully Python based, and it uses standard Python syntax, which allows us to really if we need to get out of the hood and add some customization extend it where we need to or fix things that we don't like about it. And that was a really important part of our decision and choosing Luigi over some of these other workflow managers. 00:36:09 Yeah, for sure. I had a nice conversation with the Airflow Apache Airflow folks not too long ago, and one of the things that really struck me about this is the ability for people to work on little part of the processing a little bit like that little Python automation tools or little Python projects that you described earlier in episode 327. In that instead of trying to figure out all this orchestration, you just have to figure out, well, this little task is going to do a thing, and that like I said, maybe it means see the file and then copy it over there. And if your job was see a file here, copy it over there. That's a really simple job. You can totally nail that. You know what I mean? @@ -304,13 +304,13 @@ 00:37:05 Yeah, it definitely helps. And it helps with that idea of having these small tasks. -00:37:11 It really helps with how you can develop and reuse the components for each task. We might, as we said earlier, that there are these third party tools that end up being used in almost all of our pipelines and using something like Louis or any workflow manager, you can reuse the tasks in different contexts as the B, and you can have your perfectly optimized way of using that task everywhere. That reuses is really nice. It's something I think a lot of software developers appreciate. +00:37:11 It really helps with how you can develop and reuse the components for each task. We might, as we said earlier, that there are these third party tools that end up being used in almost all of our pipelines and using something like Louis or any workflow manager, you can reuse the tasks in different contexts as the need be, and you can have your perfectly optimized way of using that task everywhere. That reuses is really nice. It's something I think a lot of software developers appreciate. -00:37:41 Yeah, for sure. So if you look at some of the folks that are using Luigi so Spotify, as you said, created it, but also for Square Stripe Asana seat geek. +00:37:41 Yeah, for sure. So if you look at some of the folks that are using Luigi so Spotify, as you said, created it, but also for Square Stripe Asana C geek. 00:37:52 A lot of companies that people probably heard of like, these places are doing awesome stuff. Let's be like them. -00:37:58 Yeah. A lot of places use it for, like a Duke and things like that. +00:37:58 Yeah. A lot of places use it for, like hadoop and things like that. 00:38:03 And one of the nice things is you mentioned about like how Airflow has the same model where you can create these contributions, which are different connectors for Luigi or for Air Flow, where you can connect them to either different cloud providers or different data stores. Things like that. And that allows you to use Luigi any workflow manager in numerous different context, whether it's locally on your own computer, running things in Doctor containers, or whether it's deploying out to AWS and scaling massively horizontal. @@ -318,17 +318,17 @@ 00:38:46 Yeah. -00:38:46 Again, they just seem so empowering for allowing people to focus on just each step independently, which is excellent. Did you consider other ones? Did you consider Airflow or Dagger or any of these other ones, or did you find this fit? And we're going with this. +00:38:46 Again, they just seem so empowering for allowing people to focus on just each step independently, which is excellent. Did you consider other ones? Did you consider Airflow or Daxter or any of these other ones, or did you find this fit? And we're going with this. -00:39:01 We did look at some other ones. We were using Next Flow for a little bit, which is a Bioinformatic flavored workflow manager. It's very focused on Bioinformatics as its primary use case, although you could use it for anything. So Tax is similar to Groovy, and it's based in Groovy, and that was one of the attractive for us is that it was a little hard to get under the boat and use that because of it. I did briefly look at Disruptive hearing a few episodes, I think made a different podcast. +00:39:01 We did look at some other ones. We were using Next Flow for a little bit, which is a Bioinformatic flavored workflow manager. It's very focused on Bioinformatics as its primary use case, although you could use it for anything. It's syntax is similar to Groovy, and it's based in Groovy, and that was one of the attractive for us is that it was a little hard to get under the hood and use that because of it. I did briefly look at Disruptive hearing a few episodes, I think made a different podcast. 00:39:33 Yeah, I did have Tobias Mason to give us an overview of the whole data engineering landscape, so possibly I know he spoke about it then, but I'm not sure when you heard about it. -00:39:41 Yeah. So I heard about it. Autopaid, probably this one as well, and I did look into it, but it didn't have it at that time. It was pretty early. I didn't have any connectors to AWS and the ways that we like to use Luigi vectors. +00:39:41 Yeah. So I heard about it. Auto pockets probably this one as well, and I did look into it, but it didn't have it at that time. It was pretty early. I didn't have any connectors to AWS and the ways that we like to use Luigi connectors. 00:39:55 That's such an important thing, because otherwise you've got to learn the API of every single thing you're talking to. -00:40:00 Yeah. These days, knowing how Luigi works, it actually wouldn't have been that big of a task to look under the hood. So we did choose Luigi, and particularly we like how it handles deployment to AWS, and we use it on the service called AWS batch, which is, I guess it might be similar to, like Kubernetes Pod, although I haven't done anything with it or anything like that, not speaking from experience, but it essentially scales up EC two instances. These elastic compute instances on the cloud as you need them, and it gives out jobs to the virtual computers as necessary. So it spins them up. Allocates jobs. That a doctor container. They run when there's no more jobs, all instance, it shuts off. +00:40:00 Yeah. These days, knowing how Luigi works, it actually wouldn't have been that big of a task to look under the hood. So we did choose Luigi, and particularly we like how it handles deployment to AWS, and we use it on the service called AWS batch, which is, I guess it might be similar to, like Kubernetes Pod, although I haven't done anything with it or anything like that, not speaking from experience, but it essentially scales up EC2 instances. These elastic compute instances on the cloud as you need them, and it gives out jobs to the virtual computers as necessary. So it spins them up. Allocates jobs. That a docker container. They run when there's no more jobs, all instance, it shuts off. 00:40:54 Okay. @@ -340,7 +340,7 @@ 00:41:15 The way batch works is you have your top level AMI that's called a compute environment, I believe. -00:41:23 And then inside of it, you run the actual job. The job runs inside of a doctor. +00:41:23 And then inside of it, you run the actual job. The job runs inside of a docker. 00:41:29 I see. So the Docker container is pre configured with all the Python dependencies and the settings that it needs and whatnot right. @@ -348,13 +348,13 @@ 00:41:50 They run their task. -00:41:51 Data comes in from S three, goes back out to S three. Nothing is left on the hard drive, and then they disappear. They're these little ephemeral compute instances. And that's all managed by a workflow manager such as Luigi or Airflow or Next Flow. +00:41:51 Data comes in from S3, goes back out to S3. Nothing is left on the hard drive, and then they disappear. They're these little ephemeral compute instances. And that's all managed by a workflow manager such as Luigi or Airflow or Next Flow. 00:42:08 Wow. 00:42:08 That's pretty awesome. One of the things that I remember reading and thinking that's a pretty crazy use of the cloud was this our technical article from look at that year 2011. So if you think back to 2011, the cloud was really new. -00:42:26 And the idea of spending a ton of money on it and getting a bunch of computer out of it was still somewhat for into people. So there's this article linked to called the hundred or 1279 per dollar per hour 30,000 core cluster built on Amazon Ecto cloud, which is this company that pharmaceutical company that needed to do a lot of computing. And they said instead of buying a supercomputer, basically, we're going to come up and just fire off an insane amount. +00:42:26 And the idea of spending a ton of money on it and getting a bunch of computer out of it was still somewhat for into people. So there's this article linked to called the hundred or 1279 per dollar per hour 30,000 core cluster built on Amazon EC2 cloud, which is this company that pharmaceutical company that needed to do a lot of computing. And they said instead of buying a supercomputer, basically, we're going to come up and just fire off an insane amount. 00:42:57 Of course. @@ -364,7 +364,7 @@ 00:43:21 Okay, you got any more stories? Can you tell us anything about this? -00:43:25 Yeah, we do occasionally have certain type of molecular modeling job that we can scale very wide. And I think the sort of 30,000 number looks pretty familiar. I think our largest jobs today have been about ten0 CPUs wide and running for a few days. So maybe like four or five days. +00:43:25 Yeah, we do occasionally have certain type of molecular modeling job that we can scale very wide. And I think the sort of 30,000 number looks pretty familiar. I think our largest jobs today have been about 10,000 CPUs wide and running for a few days. So maybe like four or five days. 00:43:48 I think the number was like four or five days on the 10,000 cores. @@ -384,7 +384,7 @@ 00:46:10 I worked on some projects when I was in grad school at We're on Silicon Graphics, big mainframe type thing and obviously much lower importance than solving diseases and stuff is just solving math problems. But I remember coming in to work on the project one day and none of our workstations could log in to the Silicon Graphics machine. And what is wrong with this thing? And it was so loud. It was in the other room. You could hear it roaring away in there. It clearly was loud. And what happened was it wasn't me. It was someone else in the group had written some code. These things have run all night. We come in the morning, we check them. And what had happened was they had a bug in their code, which they knew they were trying to diagnose it. So they were printing out a bunch of log stuff or something like that. Well, they do that in a tight loop on a high end computer for a night, and it filled up. The hard drive, still had zero bytes left. And apparently the Silicon graphics machine couldn't operate anymore was literally zero bytes. And so it just stopped working. They couldn't get an eternal. I was like it took days to get it back, I believe. -00:47:17 But it's like that kind of stuff, right? I mean, you're not going to break EC two, but you don't know until the next day that. Oh, look, you filled up the computer and it doesn't work anymore, right. When you're doing that much computing, you could run out of different resources. You could run into all kinds of problems. +00:47:17 But it's like that kind of stuff, right? I mean, you're not going to break EC2, but you don't know until the next day that. Oh, look, you filled up the computer and it doesn't work anymore, right. When you're doing that much computing, you could run out of different resources. You could run into all kinds of problems. 00:47:33 Absolutely. And we are without our war stories of doing this. But I think we definitely learned a lot of lessons along the way of how to monitor your job effectively and double check things. But sometimes you run a big job and it doesn't quite turn out right. But the cost of doing business. @@ -394,7 +394,7 @@ 00:48:12 So you talked about APIs, you talked about data store. What are you using for a database? Is this, like, hosted RDS AWS thing or what is the story with that? -00:48:24 Yeah. So we have a few different places to store data or larger scale internal data. We store in Dango based web app, and we use the Jag or for sequel based database. My sequel database on AWS, and that has worked surprisingly effectively. Actually, I've heard some people say that the Jaguar, it's really slow when you scale out and things, but if you decide it correctly, I think it'll sprout. +00:48:24 Yeah. So we have a few different places to store data or larger scale internal data. We store in Django based web app, and we use the Django ORM for SQL based database. MySQL database on AWS, and that has worked surprisingly effectively. Actually, I've heard some people say that the Django ORM, it's really slow when you scale out and things, but if you decide it correctly, I think it'll sprout. 00:48:55 I think that's so true. @@ -402,15 +402,15 @@ 00:49:00 Or this thing is slow in this way. And if you have the queries structured well, you do the joins ahead of time. If you have indexes and you put the work into finding all these things, it's mind blowing when I go to sites. -00:49:14 I won't call you. I don't know if they've been updated here, but you go to a site and you're like, this site is taking four or 5 seconds to load. What could it possibly be doing? I mean, I know it has some data, but it doesn't have unimaginable amounts of data. Right. Surely somebody could just put an index in here or worst case, a cash and it would just transform it. Right. So yeah. I'm glad to hear you're having good experiences. +00:49:14 I won't call anyone. I don't know if they've been updated here, but you go to a site and you're like, this site is taking four or 5 seconds to load. What could it possibly be doing? I mean, I know it has some data, but it doesn't have unimaginable amounts of data. Right. Surely somebody could just put an index in here or worst case, a cache and it would just transform it. Right. So yeah. I'm glad to hear you're having good experiences. 00:49:37 Yeah. 00:49:38 We definitely fairly regularly run into slow queries. They're usually not too bad to solve. I'm sure at some point we'll get to something really wacky that will be challenging, but for the most part, we've able solve it through better query design and better indexing. -00:49:56 Yeah. Do you ever do things where you opt out of the sort of class based query syntax and go straight to sequel queries here and there to make sure that that part works better. +00:49:56 Yeah. Do you ever do things where you opt out of the sort of class based query syntax and go straight to SQL queries here and there to make sure that that part works better. -00:50:07 We have tried it for some particular sequence based searches that we do, and I actually found that most of the time you can write it in the or. It's just a little more obligated, but I do expect that at some point we will be writing raw sequel queries because out of necessity. +00:50:07 We have tried it for some particular sequence based searches that we do, and I actually found that most of the time you can write it in the orm. It's just a little more complicated, but I do expect that at some point we will be writing raw sequel queries because out of necessity. 00:50:25 But it's not the majority, mainly using the Orm and then it's okay. @@ -420,57 +420,57 @@ 00:50:53 You're talking about my Monday morning cycle. -00:50:58 And in those cases, I think that's the place where it makes sense to maybe do some kind of like a projection or something. I don't know how to do it in Jango Or, but in a Mongo engine, you can say, I know what I'm going to get back is a an Iterable set of these objects that match to the data, but only actually just fill out these two fields. +00:50:58 And in those cases, I think that's the place where it makes sense to maybe do some kind of like a projection or something. I don't know how to do it in Django Orm, but in a Mongo engine, you can say, I know what I'm going to get back is a an Iterable set of these objects that match to the data, but only actually just fill out these two fields. 00:51:18 Most of the data just throw it away. Don't try to parse it and convert it, just like I just want these two fields, and that usually makes it dramatically faster. -00:51:26 Yeah. We run into a number of bottlenecks at the serialization layer, and we have been experimenting with a variety of different ways to solve those issues. And sometimes it means putting fewer thought objects between you and the data, and that often speeds it up, even if it makes it a little bit harder to interpret in your development. Or. +00:51:26 Yeah. We run into a number of bottlenecks at the serialization layer, and we have been experimenting with a variety of different ways to solve those issues. And sometimes it means putting fewer thought objects between you and the data, and that often speeds it up, even if it makes it a little bit harder to interpret in your development development. 00:51:49 Yeah, absolutely. Or just say, you know what? I just need dictionaries this time. I know it's going to be less fun, but that's what it takes. 00:51:56 That was the fixed on Monday morning. We do try to extensively use data classes for a lot of our interoperability. -00:52:04 When data comes in and out of a pipeline, we like to have it in a data class is essentially stored repository, and then our Jagger web app also has access to that. So it knows what the structure of the data coming in is, and it knows what to serialize it to when it's coming out. And the Python data classes has been a really useful tool for that. But I think you're talking about that. Another podcast a few weeks go mad with the Python bite on the data classes can be can be slow, and sometimes it's better to just have a dictionary, even if it is a very highly structured dictionary. +00:52:04 When data comes in and out of a pipeline, we like to have it in a data class is essentially stored repository, and then our Django web app also has access to that. So it knows what the structure of the data coming in is, and it knows what to serialize it to when it's coming out. And the Python data classes has been a really useful tool for that. But I think you're talking about that. Another podcast a few weeks ago may be it was the Python bite on the data classes can be can be slow, and sometimes it's better to just have a dictionary, even if it is a very highly structured dictionary. -00:52:40 The problem is maintainability and whatnot but you know, if it's five times or you know what? This time it matters that we're going to just it the Bull and have to deal with it a sad time for just a little bit more digging into it. You talked about the Jango RM Jingo Rest framework, which is all great. What's the server deployment story like? How do you run that thing? Is it with unicorn? Is it micro whiskey? What's your on that side of things? +00:52:40 The problem is maintainability and whatnot but you know, if it's five times slower like you know what? This time it matters that we're going to just bite the bullet and have to deal with it a sad time for just a little bit more digging into it. You talked about the Django ORM Django REST framework, which is all great. What's the server deployment story like? How do you run that thing? Is it with Gunicorn? Is it micro whiskey? What's your on that side of things? -00:53:07 Ours is a little a little custom, I guess in some ways it's pretty standard. I think we're like on exactly how it's set up now, but there's a NGINX proxy I am liking on it might be unicorn. +00:53:07 Ours is a little a little custom, I guess in some ways it's pretty standard. I think we're like on exactly how it's set up now, but there's a NGINX proxy I am liking on it might be Gunicorn. -00:53:22 I feel like Unicorn and Jango go together frequently. I'm not sure why they got paired up specifically, but yeah, it's a good one. +00:53:22 I feel like Gunicorn and Django go together frequently. I'm not sure why they got paired up specifically, but yeah, it's a good one. -00:53:30 And we ended up deploying it out to AWS Elastic Beam stock, which is a source of some conversation in our team, because there's some things we really like about it, and there's some things that are really annoying in terms of the deployment is much more complicated than we would like it to be, but we have everything wrapped up in a pretty gnarly CDK stack that does a lot of the square. +00:53:30 And we ended up deploying it out to AWS Elastic Bean stock, which is a source of some conversation in our team, because there's some things we really like about it, and there's some things that are really annoying in terms of the deployment is much more complicated than we would like it to be, but we have everything wrapped up in a pretty gnarly CDK stack that does a lot of the work. 00:53:54 It was messy, but you've solved it with a CDK now it's just you push the button and it's okay. -00:53:59 It's exactly like that. We have a very automated deployment process. I wouldn't like to refactor it, but it's there that works that works for us, but I think it's a pretty standard Jaggo deploy on the cloud and edit it works well. +00:53:59 It's exactly like that. We have a very automated deployment process. I wouldn't like to refactor it, but it's there that works that works for us, but I think it's a pretty standard Django deploy on the cloud and edit it works well. 00:54:16 Yeah. Cool. Well, I think that's probably about it for time to talk about the specifics of what you all are doing, but there's the last two questions, as always. 00:54:26 So let's start with the editor. If you're going to write some Python code. -00:54:29 What editor do you use the yes code a lot of the remote development environment on that. +00:54:29 What editor do you use the VS code a lot of the remote development environment on that. 00:54:34 Yeah. You use the remote aspect of it. -00:54:37 Yeah. We're doing a lot of work on EC two instances as our day to day work and the code the way it works with instances in the Cloud is really amazing. So I encourage anyone to check out that extension. +00:54:37 Yeah. We're doing a lot of work on EC2 instances as our day to day work and the VS code the way it works with instances in the Cloud is really amazing. So I encourage anyone to check out that extension. -00:54:51 You get access to the file system on the remote machine. And basically it's just your view into that server, but it's more or less standard vs code. Right. But when you hit run, it just runs up there. +00:54:51 You get access to the file system on the remote machine. And basically it's just your view into that server, but it's more or less standard VS code. Right. But when you hit run, it just runs up there. -00:55:02 It feels exactly like you're on your own computer. Sometimes I actually get confused whether I'm auto motor. +00:55:02 It feels exactly like you're on your own computer. Sometimes I actually get confused whether I'm on a remote. 00:55:07 It doesn't work because I'm in Virginia. I see. Yeah. 00:55:10 All right. -00:55:10 And then notable Pipi package. +00:55:10 And then notable PyPI package. -00:55:12 But I'll have to shout out some of the ones we talked about it. I would encourage people to look at as CDK if they're on AWS, I think it has some really interesting things there. And then also Luigi as a workflow manager. If people are do any of these types of, you know, data pipelines that have as they can reuse, these sort of workload managers are really cool. Luigi is a pretty accessible one. Probably one that's familiar with Python. +00:55:12 But I'll have to shout out some of the ones we talked about it. I would encourage people to look at AWS CDK if they're on AWS, I think it has some really interesting things there. And then also Luigi as a workflow manager. If people are do any of these types of, you know, data pipelines that have as they can reuse, these sort of workload managers are really cool. Luigi is a pretty accessible one. Probably one that's familiar with Python. 00:55:35 Yeah. All right. Fantastic. Yeah. I'm just learning to embrace these workflow managers, but they do seem really amazing. All right. So for all the biologists and scientists out there listening, you've got this really cool setup and all this cool computational infrastructure, what do you tell them? How do they get started? Maybe biology or whatever. 00:55:54 I think biologists a good place to start. We're also happy to have people come from a software background that are really interested in learning the biology. And I guess as a final plug, we do have a few open positions. So if you're interested, it got our career stage and give us an application. -00:56:10 Or you guys a remote place. +00:56:10 Are you guys a remote place. 00:56:12 Remote friendly or what's the story these days? @@ -484,21 +484,21 @@ 00:56:27 David, thank you for being here and giving us this look inside all the gene editing Python stuff you're doing. -00:56:33 Thank you, Michael. It's, please. +00:56:33 Thank you, Michael. It's, pleasure. 00:56:34 Yeah, you bet. Bye bye. 00:56:35 This has been another episode of Talk Python to me. -00:56:39 Our guest on this episode was David Born, and it's been brought to you by Shortcut us over at Talk Python training, and the transcripts were brought to you by assembly AI. +00:56:39 Our guest on this episode was David Born, and it's been brought to you by Shortcut & Us over at Talk Python training, and the transcripts were brought to you by assembly AI. -00:56:48 Choose Shortcut, formerly Club House IO for tracking all of your projects work because you shouldn't have to project manage your project management. Visit a Python FM shortcut. +00:56:48 Choose Shortcut, formerly Club House IO for tracking all of your projects work because you shouldn't have to project manage your project management. Visit 'talkpython.fm/shortcut'. -00:57:00 Do you need a great automatic speechtotext API? Get human level accuracy in just a few lines of code? +00:57:00 Do you need a great automatic speech-to-text API? Get human level accuracy in just a few lines of code? -00:57:05 Visit Talk Python FM assembly AI when you level up your Python, we have one of the largest catalogs of Python video courses over at Talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in sight. Check it out for yourself at training to Python FM. Be sure to subscribe to the show. Open your favorite podcast app and search for Python. We should be right at the top. +00:57:05 Visit 'talkpython.fm/assemblyAI'. Want you level up your Python, we have one of the largest catalogs of Python video courses over at Talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in sight. Check it out for yourself at 'training.talkpython'. Be sure to subscribe to the show. Open your favorite podcast app and search for Python. We should be right at the top. -00:57:31 You can also find the itunes feed at itunes, the Google Play feed at Play and the Direct RSS feed at RSS on Talk Python Film. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at Hawk Python Film YouTube. +00:57:31 You can also find the itunes feed at /itunes, the Google Play feed at /Play and the Direct RSS feed at /RSS on 'talkpython.fm'. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at 'talkpython.fm/youtube'. 00:57:52 This is your host, Michael Kennedy. @@ -506,4 +506,4 @@ 00:57:55 I really appreciate it. -00:57:56 Now get out there and write some Python code. \ No newline at end of file +00:57:56 Now get out there and write some Python code.