ml-infra

Overview

This repository provides a streamlined ML infrastructure environment using Minikube. It includes:

Training the ResNet18 model on the MNIST dataset.
Deploying the trained model for inference using Triton Inference Server.
Kubernetes Helm charts to manage basic resources.
Automation scripts to facilitate setup.

Prerequisites

Minikube installed on your local machine.
Podman and cri-o installed and configured (instead of Docker).
Helm and Helmfile installed.
Just installed for task automation.

First, ensure Minikube is configured to use cri-o as the container runtime. Follow the setup instructions in the Justfile tasks.

Project Structure

helmfile.yaml: Root Helmfile to manage multiple Helm charts.
bases/: Base configuration files.
releases/: Release configurations for components like NFS, Ingress, Seldon Core, and training jobs.
charts/: Helm charts and Python scripts for the ResNet18 training job and Triton client.
hack/: Helper scripts for setting up the environment, including NFS, DNS, and Docker registry.
.env, .env.minikube: Environment variable files.

Usage

Setting Up the Environment

Install Podman and cri-o
```
just podman
just crio
```

Start Minikube

just k8s-up
just k8s-down # if there's a problem

Install NFS server and client
```
just nfs
```
Update /etc/hosts on the host and Minikube, and patch Corefile of CoreDNS
```
just dns
```
Prepare private registry on the host and crio in Minikube
```
just registry
```

Building Containers

Build the training job

just docker-build
just docker-run
just docker-push

Build the Triton client

just docker-build triton-client
just docker-run triton-client
just docker-push triton-client

Running the Setup

Select the environment
```
just env minikube
```
Deploy all Helm charts at once
```
just apply
```

(Optional) Deploy Helm charts by label

just apply tier=common # Storage Class, Priority Class
just apply tier=ops # NFS, Ingress
just apply tier=ml # Seldon Core Operator, Triton Server
just apply tier=train # Training Job
just apply tier=client # Triton Client

Future Works

Automated CI/CD/CT Pipeline
Drift Monitoring and Logging
Scalability Improvements using Seldon
Model Versioning and Management using DVC
MetalLB Integration to handle external traffic
Distributed Training with Ray or DistributedBackend

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ml-infra

Overview

Prerequisites

Project Structure

Usage

Setting Up the Environment

Building Containers

Running the Setup

Future Works

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
bases		bases
charts		charts
hack		hack
releases		releases
.editorconfig		.editorconfig
.env		.env
.env.minikube		.env.minikube
.gitignore		.gitignore
.tool-versions		.tool-versions
Justfile		Justfile
README.md		README.md
helmfile.yaml		helmfile.yaml

ziwon/ml-helmfiles

Folders and files

Latest commit

History

Repository files navigation

ml-infra

Overview

Prerequisites

Project Structure

Usage

Setting Up the Environment

Building Containers

Running the Setup

Future Works

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages