Also in Guides
Scaling MLOps education
Operational efficiency must be at the core of any technology system. MLOps builds upon DevOps, which in turn builds on the concept of kaizen, the Japanese word for continuous improvement. Without continuous improvement, you wouldn’t have DevOps or, by extension, MLOps.
In this Guide, you will learn how to:
Apply best practices for sustainability and energy efficiency by using the Rust language.
Level up to using a more robust language, Rust, with GitHub Copilot.
Think differently about the false appearance of progress in data science and MLOps projects.
At the heart of continuously improving operations is a simple question: “Can we improve operational performance—from training and inference to packaging and delivery—by ten times or more?” If the answer is yes, as it will be with many organizations using Python for data science, the next question should be: "Why are we not doing it?"
For decades, organizations had few options besides pure C/C++ and Python for machine learning solutions. C++ may provide more efficiency in terms of performance, but Python is generally easier to learn, implement, and maintain, which is why Python has taken off in data science. The hard choice between the performant but complex C++ and the easy-to-learn but comparatively slow Python ultimately results in many companies choosing Python.
But there is another way. Rust consistently ranks among the most performant and energy-efficient languages. It also happens to be among the most loved languages in StackOverflow’s annual developer survey. Though some Python libraries widely used in data science are written in C and can provide some of the performance benefits of running a compiled language, Rust provides a more direct route to bare metal while using a single language.
Rust is also far easier to learn and use than C or C++, which makes it a realistic solution for those who want the performance of a compiled language. That’s especially the case when using GitHub Copilot, an AI-powered pair programmer that uses the OpenAI Codex to suggest code and entire functions in real time to developers while they code. Let's discuss this strategy next.
We want to hear from you! Join us on GitHub Discussions.
The case for Rust for MLOps
GitHub Copilot is a revolutionary new change in the way developers work. GitHub Copilot and tools like it are a game changer since they minimize the impact of syntax on productivity. With Rust, you spend more time working on compiling code, which is an investment in future returns, much like saving for the future in a retirement account. Rust has great performance and safety, but the syntax can be challenging. With GitHub Copilot, the syntax becomes less of an issue since the suggestions eliminate many of the difficulties in programming. Additionally, because of the robustness of the Rust toolchain for linting, formatting, and compiling, any errors or false starts from GitHub Copilot are caught by the these tools, making the combination of Rust and GitHub Copilot an emergent front-runner in AI-assisted coding.
There are several reasons to consider Rust other than performance. Rust is a modern language that first appeared in 2010. It lacks the baggage that older languages carry, but it’s established enough that we can rest assured it isn’t going anywhere anytime soon. Further, other trends are supporting a hard look at Rust.
Rust was designed from the ground up to support modern computing capabilities, like multi-core threads, that are often “bolted on” to older languages like Python. By designing the language to support these features from the start, Rust can avoid the awkwardness found in many other languages. A great example of how simple multi-core threads are in Rust is the following snippet from the Rust rayon library:
use rayon::prelude::*;
fn sum_of_squares(input: &[i32]) -> i32 {
input.par_iter()
.map(|i| i * i)
.sum()
}
There are no gimmicks or hacks to the code; the threads “just work” across all the machine cores, and the code is just as readable as Python.
Likewise, Rust was built to support typing, so the entire toolchain from the linter to the editor to the compiler can leverage this capability. Rust also makes packaging a breeze. Cargo provides a Python-esque “one obvious way” to install packages.
Of course there are still areas where Python excels. It’s fantastic for API documentation and readability in general. If you need to try out an idea, it is hard to beat using Python in an interactive prompt, like IPython, to explore a concept. But MLOPs is more sensitive to performance requirements than other data science fields, and is heavily dependent on software engineering best practices that are better implemented with Rust. A new superset of Python called Mojo might solve many performance and deployment issues in the near future, but it's still in development while Rust is available in the here and now.
One common objection to the use of Rust is that it doesn’t have as large and established an ecosystem as Python does for working with data. But keep in mind that this ecosystem isn’t necessarily tuned to the needs of MLOps. In particular, the stack I call #jcpennys (Jupyter, Conda, Pandas, Numpy, Sklearn) is straight from academia, heavyweight, and optimized for use with small data. In academics, there is much to be said for a "God environment" with everything in one spot. But in real-world production MLOps, you don't want extra packages or brittle tools that are difficult to test, like notebooks. Meanwhile, the Rust ecosystem is growing. For example, Polars is a performant data frame library.
Leveling up with Rust, GitHub Copilot, and Codespaces
Let's look at how you can use the GitHub ecosystem to level up to a more robust language in Rust.
All Rust projects can follow this pattern:
Create a new repo using Rust New Project Template.
Create a new Codespace and use it.
Use
main.rs
to call the handle CLI andlib.rs
to handle logic and importclap
inCargo.toml
as shown in this project.Use
cargo init --name 'hello'
or whatever you want to call your project.Put your "ideas" in as comments in Rust to seed GitHub Copilot, i.e
//build anadd function
Run
make format
i.e.cargo format
Run
make lint
i.e.cargo clippy --quiet
Run project:
cargo run -- --help
Push your changes to allow GitHub Actions to
format
check,lint
check, and other actions like binary deploy.
This is a new emerging pattern ideal for systems programming in Rust, as certain combinations lead to new advances. For example, steel is a composite of iron and carbon, making a new substance stronger and harder than iron. Similarly, GitHub Copilot’s suggestions combined with a next generation compiled language like Rust and its ecosystem of formatting, linting, and packaging tools leads to the computer science equivalent of an alloy: a new, stronger solution to computational problems.
Here’s an example repository.
A good starting point for a new Rust project is the following pattern:
To run: cargo run -- marco --name "Marco"
Be careful to use the NAME of the project in the Cargo.toml
to call lib.rs
as in:
[package]
name = "hello"
For example, see the name `hello` invoked alongside marco_polo
, which is in lib.rs
lib.rs
code:
rust
/* A Marco Polo game. */
/* Accepts a string with a name.
If the name is "Marco", returns "Polo".
If the name is "any other value", it returns "Marco".
*/
pub fn marco_polo(name: &str) -> String {
if name == "Marco" {
"Polo".to_string()
} else {
"Marco".to_string()
}
}
main.rs
code:
Rust
fn main() {
let args = Cli::parse();
match args.command {
Some(Commands::Marco { name }) => {
println!("{}", hello::marco_polo(&name));
}
None => println!("No command was used"),
}
}
Retrofitting a VW bug from the 1970s with modern EV technology is a suboptimal financial strategy. Similarly, bolting more and more non-native components onto Python is a suboptimal strategy when, instead, you could choose a new language when appropriate. Additionally, the old paradigm of mixing C with Python needs to be clarified if a developer can use one language and replace both with Rust.
In distributed computing, performance does matter, as does cybersecurity, energy usage, and binary distribution of software. Rust has a lot of compelling use cases for MLOps, and additional examples are in the Rust MLOps repo as well as a tutorial including notes from the Duke cloud computing course teaching Rust with GitHub Copilot.
We want to hear from you! Join us on GitHub Discussions.
We shouldn't treat software languages like sports teams we “root for.” The pragmatic practitioner looks for tools that efficiently solve problems. Languages like Go and Rust have emerged as solutions for high-performance computing, and Rust, in particular, shines at cybersecurity safety, a weakness of most languages like C and Python. The slight increase in complexity will pay off for organizations in the form of fewer bugs, more secure code, less toil for developers when managing packages and dependency, and lower compute costs.
As you look around your organization, you’re bound to find numerous areas that can benefit from Rust’s improved cost profiles. Embedding ML models inside of command line tools is a great place to start. This could open up a new world of possibilities for sophisticated, binary-distributed tools. Microsoft has also adopted Rust bindings for the ONNX Runtime, which should increase the likelihood of new emerging embedded solutions in binary command line tools. Likewise, edge and embedded ML are ideal targets for Rust, since it is an excellent solution for low-memory and lower-energy workloads. Even if you start small, you’re bound to find some big wins.