- Building a large language model deployment pipeline in Kubernetes
- Software Engineering - Data science (development)
- Strong programming and machine learning background
🏁 To undertake this project, you should have completed one or more of the following modules.
Cloud computing, Neural Networks and Deep Learning, Software design and programming, Programming in Java, Software design and programming, Internet and web technologies, Applied Machine Learning
- Knowledge or willingness to learn deploying and building LLMs.
- Fundamental knowledge of cloud computing and command line interfaces (Linux).
- Understanding of
Kubernetes
for deployment. - Knowledge or willingness to learn libraries such as
Tensorflow
,Torch
and others for data modelling. - Knowledge of
GitHub
and theGitHub cli
.
Large language models (LLMs) like GPT-4 and its predecessors have become popular for their versatility and research breakthroughs.
LLMs' size and complexity introduce several challenges regarding real-world usage, including significant memory and computational requirements. LLMs' heavy size means they require specialized hardware (often high-end GPUs or TPUs) with significant memory to run efficiently. While LLMs can be fine-tuned on specific tasks or datasets, the process can be resource-intensive and tricky, given the size and complexity of the model.
🎯 The project aims to develop an open-source CI/CD deployment pipeline for LLM models such as Mistral, Llama and others on Kubernetes clusters in a shared environment (e.g. using multi-instance GPU capabilities). The pipeline will be developed as a Helm chart to templatize the Kubernetes manifests.
The project use case includes the development of a seed application to highlight the deployment pipeline. The application will allow users to deploy a text file and query directly using the appropriate model.
The core solution is organised in features that include the following tasks:
-
Deploy the model and create a seed application on Kubernetes in a multi-instance GPU environment.
-
Develop the seed application.
-
Develop deployment pipelines using Helm charts.
-
Develop a use case and offer it as a service using a simple UI.
🔍 Other features could also be identified and proposed by the student.
- Access to GPU resources on Google Cloud Platform or Amazon Web Services.
- Programming seed repositories and other resources to start your developments.
- Development support and advice by expert developers.
- You will learn how to deploy LLMs on
Kubernetes
and build automated pipelines. - You will learn how to build use cases for LLMs.
- You will learn how to use GPU processors for generative AI.
- You will develop skills in REST API development using
Python
and associated libraries.
- 4/5
This project includes data-science-related developments including deployment of models in cloud providers. The project implementation will be organized into features - not all features need to be developed for this project.