MLModelScope
The current landscape of Machine Learning (ML) and Deep Learning (DL) is rife with non-uniform models, frameworks, and system stacks but lacks standard tools to evaluate and profile models or systems. Due to the absence of such tools, the current practice for evaluating and comparing the benefits of proposed AI innovations (be it hardware or software) on end-to-end AI pipelines is both arduous and error prone --- stifling the adoption of the innovations.
MLModelScope is a hardware/software agnostic, extensible and customizable platform for evaluating and profiling ML models across datasets/frameworks/hardware, and within AI application pipelines. MLModelScope lowers the cost and effort for performing model evaluation and profiling, making it easier for others to reproduce, evaluate, and analyze acurracy or performance claims of models and systems.
It is designed to aid in:
- reproducing and comparing with published models, and designing models with performance and deployment in mind,
- understanding the model performance (within realworld AI workflows) and its interaction with all levels of the hardware/software stack
- discovering models, frameworks and hardware that are applicable to users' datasets.
To achieve this, MLModelScope:
- Provides a consistent evaluation, aggregation, and reporting system by defining
- techniques to specify and provision workflows with HW/SW stacks
- abstractions for evaluation and profiling using different frameworks
- data consumption for evaluation outputs
- Enables profiling of experiments throughout the entire pipeline and at different abstraction levels (application, model, framework, layer, library and hardware, as shown on the right)
- Is framework agnostic - with current support for PyTorch, TensorFlow, ONNXRuntime, MXNet
- Is extensible and customizable - allowing users to extend MLModelScope by adding models, frameworks, or library and system profilers.
- Can run experiments on separate machines, and behind firewall (does not exposing model weights or machine specification)
- Allows parallel evaluation (multiple instantiations of the same experiment set-up across systems)
- Specifies model and framework resources as asset files which can be added easily, even at runtime
MLModelScope can be used as an application with a command line, API or web interface, or can be compiled into a standalone library. We also provide an online hub of continuously updated assets, evaluation results, and access to hardware resources — allowing users to discover and evaluate models without installing or configuring systems.
- Docker
docker run --rm -it xlabub/pytorch-agent:standalone-cpu-pytorch2.0.1-latest
docker run --rm -it --gpus all xlabub/pytorch-agent:standalone-gpu-pytorch2.0.1-cuda11.7-latest
If you want the agent not to download huggingface models every time, you can use the following command:
docker run --rm -it -e HF_HOME=/root/.cache/huggingface \
--gpus all -v ~/.cache/huggingface:/root/.cache/huggingface xlabub/pytorch-agent:standalone-gpu-pytorch2.0.1-cuda11.7-latest
docker run --rm -it xlabub/standalone-cpu-tensorflow2.10.1-latest
docker run --rm -it --gpus all xlabub/standalone-gpu-tensorflow2.10.1-cuda11.2-latest
docker run --rm -it xlabub/standalone-cpu-onnxruntime1.19.2-latest
docker run --rm -it --gpus all xlabub/standalone-gpu-onnxruntime1.19.2-cuda11.8-latest
docker run --rm -it xlabub/standalone-cpu-mxnet1.9.1-latest
docker run --rm -it xlabub/standalone-cpu-jax0.4.30-latest
If you are planning to use the GPU profiling capabilities of the agent, you will need to install the CUPTI library. Otherwise, you can skip this step.
Please refer to Nvidia CUDA library installation on this. Find the localation of your local CUDA installation, which is typically at /usr/local/cuda/
, and setup the path to the libcublas.so
library.
Please refer to Nvidia CUPTI library installation on this. Find the localation of your local CUPTI installation, which is typically at /usr/local/cuda/extras/CUPTI
, and setup the path to the libcupti.so
library.
Also, please install Pre-requsite Dynamic Library.
On Linux
cd pycupti/csrc
export PATH="/usr/local/cuda/bin:$PATH"
nvcc -O3 --shared -Xcompiler -fPIC utils.cpp -o libutils.so -lcuda -lcudart -lcupti -lnvperf_host -lnvperf_target -I /usr/local/cuda/extras/CUPTI/include -L /usr/local/cuda/extras/CUPTI/lib64
On Windows
cd pycupti/csrc
nvcc -O3 --shared utils.cpp -o utils.dll -I"%CUDA_PATH%/include" -I"%CUDA_PATH%/extras/CUPTI/include" -L"%CUDA_PATH%"/extras/CUPTI/lib64 -L"%CUDA_PATH%"/lib/x64 -lcuda -lcudart -lcupti -lnvperf_host -lnvperf_target -Xcompiler "/EHsc /GL /Gy /O2 /Zc:inline /fp:precise /D "_WINDLL" /Zc:forScope /Oi /MD" && del utils.lib utils.exp
After running above commands, please check whether libutils.so
on Linux or utils.dll
on Windows is in pycupti/csrc
directory.
This service is required to collect traces from the agent.
docker run -d --name jaeger -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 -e COLLECTOR_OTLP_ENABLED=true -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778 -p 16686:16686 -p 4317:4317 -p 4318:4318 -p 14250:14250 -p 14268:14268 -p 14269:14269 -p 9411:9411 jaegertracing/all-in-one:1.44
The trace server runs on http://localhost:16686
It is recommended to set up a virtual environment using Conda for the PyTorch agent. Below are the steps to prepare the environment, referring to the Dockerfile.standalone_cpu_pytorch2.0.1
and Dockerfile.standalone_gpu_pytorch2.0.1_cuda11.7
:
Ensure you have the following installed on your system:
- Python >= 3.8
- Conda (for virtual environment management)
- CUDA (if using GPU)
- cuDNN (if using GPU)
For CPU-only:
conda create -n pytorch_cpu_env python=3.8
conda activate pytorch_cpu_env
For GPU-enabled:
conda create -n pytorch_gpu_env python=3.8
conda activate pytorch_gpu_env
For CPU-only:
pip install --upgrade pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cpu
pip install transformers diffusers sentencepiece opentelemetry-api opentelemetry-sdk \
opentelemetry-exporter-otlp-proto-grpc grpcio opentelemetry-exporter-otlp-proto-http \
httpio aenum requests tqdm scipy chardet psycopg "psycopg[binary]" Pika opencv-contrib-python
For GPU-enabled:
pip install --upgrade pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install transformers diffusers sentencepiece opentelemetry-api opentelemetry-sdk \
opentelemetry-exporter-otlp-proto-grpc grpcio opentelemetry-exporter-otlp-proto-http \
httpio aenum requests tqdm scipy chardet psycopg "psycopg[binary]" Pika opencv-contrib-python
git clone https://github.com/xlab-ub/py-mlmodelscope.git
cd py-mlmodelscope
For CPU-only:
python run_mlmodelscope.py --standalone true --agent pytorch --architecture cpu
For GPU-enabled:
python run_mlmodelscope.py --standalone true --agent pytorch --architecture gpu
It is recommended to set up a virtual environment using Conda for the TensorFlow agent. Below are the steps to prepare the environment, referring to the Dockerfile.standalone_cpu_tensorflow2.10.1
and Dockerfile.standalone_gpu_tensorflow2.10.1_cuda11.2
:
Ensure you have the following installed on your system:
- Python >= 3.7
- Conda (for virtual environment management)
- CUDA (if using GPU)
- cuDNN (if using GPU)
For CPU-only:
conda create -n tensorflow_cpu_env python=3.7
conda activate tensorflow_cpu_env
For GPU-enabled:
conda create -n tensorflow_gpu_env python=3.7
conda activate tensorflow_gpu_env
For CPU-only:
pip install --upgrade pip
pip install tensorflow-cpu==2.10.1 tensorflow-hub
pip install transformers sentencepiece opentelemetry-api opentelemetry-sdk \
opentelemetry-exporter-otlp-proto-grpc grpcio opentelemetry-exporter-otlp-proto-http \
httpio aenum requests tqdm scipy chardet psycopg "psycopg[binary]" Pika opencv-contrib-python \
protobuf==3.20.* Pillow
For GPU-enabled:
pip install --upgrade pip
pip install tensorflow==2.10.1 tensorflow-hub
pip install transformers sentencepiece opentelemetry-api opentelemetry-sdk \
opentelemetry-exporter-otlp-proto-grpc grpcio opentelemetry-exporter-otlp-proto-http \
httpio aenum requests tqdm scipy chardet psycopg "psycopg[binary]" Pika opencv-contrib-python \
protobuf==3.20.* Pillow
git clone https://github.com/xlab-ub/py-mlmodelscope.git
cd py-mlmodelscope
For CPU-only:
python run_mlmodelscope.py --standalone true --agent tensorflow --architecture cpu --model_name resnet50_keras
For GPU-enabled:
python run_mlmodelscope.py --standalone true --agent tensorflow --architecture gpu --model_name resnet50_keras
It is recommended to set up a virtual environment using Conda for the ONNXRuntime agent. Below are the steps to prepare the environment, referring to the Dockerfile.standalone_cpu_onnxruntime1.19.2
and Dockerfile.standalone_gpu_onnxruntime1.19.2_cuda11.8
:
Ensure you have the following installed on your system:
- Python >= 3.8
- Conda (for virtual environment management)
- CUDA (if using GPU)
- cuDNN (if using GPU)
For CPU-only:
conda create -n onnxruntime_cpu_env python=3.8
conda activate onnxruntime_cpu_env
For GPU-enabled:
conda create -n onnxruntime_gpu_env python=3.8
conda activate onnxruntime_gpu_env
For CPU-only:
pip install --upgrade pip
pip install onnxruntime==1.19.2 optimum[onnxruntime] onnxruntime-training
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc \
grpcio opentelemetry-exporter-otlp-proto-http httpio aenum requests tqdm torchvision \
scipy chardet psycopg "psycopg[binary]" Pika opencv-contrib-python
For GPU-enabled:
pip install --upgrade pip
pip install onnxruntime-gpu==1.19.2 optimum[onnxruntime-gpu] onnxruntime-training
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc \
grpcio opentelemetry-exporter-otlp-proto-http httpio aenum requests tqdm torchvision \
scipy chardet psycopg "psycopg[binary]" Pika opencv-contrib-python
git clone https://github.com/xlab-ub/py-mlmodelscope.git
cd py-mlmodelscope
For CPU-only:
python run_mlmodelscope.py --standalone true --agent onnxruntime --architecture cpu
For GPU-enabled:
python run_mlmodelscope.py --standalone true --agent onnxruntime --architecture gpu
It is recommended to set up a virtual environment using Conda for the MXNet agent. Below are the steps to prepare the environment, referring to the Dockerfile.standalone_cpu_mxnet1.9.1
.
Ensure you have the following installed on your system:
- Python >= 3.8
- Conda (for virtual environment management)
conda create -n mxnet_cpu_env python=3.8
conda activate mxnet_cpu_env
pip install --upgrade pip
pip install mxnet==1.9.1 numpy==1.23.1 torchvision==0.9.0
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc \
grpcio opentelemetry-exporter-otlp-proto-http httpio aenum requests tqdm scipy \
chardet psycopg "psycopg[binary]" Pika opencv-contrib-python
git clone https://github.com/xlab-ub/py-mlmodelscope.git
cd py-mlmodelscope
python run_mlmodelscope.py --standalone true --agent mxnet --architecture cpu --model_name alexnet
It is recommended to set up a virtual environment using Conda for the JAX agent. Below are the steps to prepare the environment, referring to the Dockerfile.standalone_cpu_jax0.4.30
.
Ensure you have the following installed on your system:
- Python >= 3.9
- Conda (for virtual environment management)
conda create -n jax_cpu_env python=3.9
conda activate jax_cpu_env
pip install --upgrade pip
pip install jax==0.4.30 flax
pip install transformers sentencepiece opentelemetry-api opentelemetry-sdk \
opentelemetry-exporter-otlp-proto-grpc grpcio opentelemetry-exporter-otlp-proto-http \
httpio aenum requests tqdm scipy chardet psycopg "psycopg[binary]" Pika \
opencv-contrib-python Pillow
git clone https://github.com/xlab-ub/py-mlmodelscope.git
cd py-mlmodelscope
python run_mlmodelscope.py --standalone true --agent jax --architecture cpu --model_name resnet_50
[1] c3sr, “GitHub - c3sr/mlmodelscope: MLModelScope is an open source, extensible, and customizable platform to facilitate evaluation and measurement of ML models within AI pipelines.,” GitHub.
[2] c3sr, “GitHub - c3sr/mlmodelscope-api: API to support MLModelscope frontend,” GitHub.
[3] c3sr, “GitHub - c3sr/go-pytorch,” GitHub, Oct. 25, 2021. https://github.com/c3sr/go-pytorch
[4] “PyTorch,” PyTorch. https://www.pytorch.org
[5] “OpenTelemetry,” OpenTelemetry. https://opentelemetry.io/
[6] “Jaeger: open source, end-to-end distributed tracing,” Jaeger: open source, end-to-end distributed tracing, May 30, 2022. https://www.jaegertracing.io/