Skip to content

xlab-ub/py-mlmodelscope

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

py-mlmodelscope

MLModelScope

The current landscape of Machine Learning (ML) and Deep Learning (DL) is rife with non-uniform models, frameworks, and system stacks but lacks standard tools to evaluate and profile models or systems. Due to the absence of such tools, the current practice for evaluating and comparing the benefits of proposed AI innovations (be it hardware or software) on end-to-end AI pipelines is both arduous and error prone --- stifling the adoption of the innovations.

MLModelScope is a hardware/software agnostic, extensible and customizable platform for evaluating and profiling ML models across datasets/frameworks/hardware, and within AI application pipelines. MLModelScope lowers the cost and effort for performing model evaluation and profiling, making it easier for others to reproduce, evaluate, and analyze acurracy or performance claims of models and systems.

It is designed to aid in:

  1. reproducing and comparing with published models, and designing models with performance and deployment in mind,
  2. understanding the model performance (within realworld AI workflows) and its interaction with all levels of the hardware/software stack
  3. discovering models, frameworks and hardware that are applicable to users' datasets.

To achieve this, MLModelScope:

  • Provides a consistent evaluation, aggregation, and reporting system by defining
    • techniques to specify and provision workflows with HW/SW stacks
    • abstractions for evaluation and profiling using different frameworks
    • data consumption for evaluation outputs
  • Enables profiling of experiments throughout the entire pipeline and at different abstraction levels (application, model, framework, layer, library and hardware, as shown on the right)
  • Is framework agnostic - with current support for PyTorch, TensorFlow, ONNXRuntime, MXNet
  • Is extensible and customizable - allowing users to extend MLModelScope by adding models, frameworks, or library and system profilers.
  • Can run experiments on separate machines, and behind firewall (does not exposing model weights or machine specification)
  • Allows parallel evaluation (multiple instantiations of the same experiment set-up across systems)
  • Specifies model and framework resources as asset files which can be added easily, even at runtime

MLModelScope can be used as an application with a command line, API or web interface, or can be compiled into a standalone library. We also provide an online hub of continuously updated assets, evaluation results, and access to hardware resources — allowing users to discover and evaluate models without installing or configuring systems.

Quick Start Guide

Prerequisites

  • Docker

Running PyTorch Agent

CPU

docker run --rm -it xlabub/pytorch-agent:standalone-cpu-pytorch2.0.1-latest

GPU

docker run --rm -it --gpus all xlabub/pytorch-agent:standalone-gpu-pytorch2.0.1-cuda11.7-latest

If you want the agent not to download huggingface models every time, you can use the following command:

docker run --rm -it -e HF_HOME=/root/.cache/huggingface \
    --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface xlabub/pytorch-agent:standalone-gpu-pytorch2.0.1-cuda11.7-latest

Running Tensorflow Agent

CPU

docker run --rm -it xlabub/standalone-cpu-tensorflow2.10.1-latest

GPU

docker run --rm -it --gpus all xlabub/standalone-gpu-tensorflow2.10.1-cuda11.2-latest

Running ONNXRuntime Agent

CPU

docker run --rm -it xlabub/standalone-cpu-onnxruntime1.19.2-latest

GPU

docker run --rm -it --gpus all xlabub/standalone-gpu-onnxruntime1.19.2-cuda11.8-latest

Running MXNet Agent

CPU

docker run --rm -it xlabub/standalone-cpu-mxnet1.9.1-latest

Running JAX Agent

CPU

docker run --rm -it xlabub/standalone-cpu-jax0.4.30-latest

Deployment for Development

CUPTI Library Preparation (required only for GPU profiling)

If you are planning to use the GPU profiling capabilities of the agent, you will need to install the CUPTI library. Otherwise, you can skip this step.

The CUDA Library

Please refer to Nvidia CUDA library installation on this. Find the localation of your local CUDA installation, which is typically at /usr/local/cuda/, and setup the path to the libcublas.so library.

The CUPTI Library

Please refer to Nvidia CUPTI library installation on this. Find the localation of your local CUPTI installation, which is typically at /usr/local/cuda/extras/CUPTI, and setup the path to the libcupti.so library.

Also, please install Pre-requsite Dynamic Library.

On Linux

cd pycupti/csrc 
export PATH="/usr/local/cuda/bin:$PATH" 
nvcc -O3 --shared -Xcompiler -fPIC utils.cpp -o libutils.so -lcuda -lcudart -lcupti -lnvperf_host -lnvperf_target -I /usr/local/cuda/extras/CUPTI/include -L /usr/local/cuda/extras/CUPTI/lib64 

On Windows

cd pycupti/csrc 
nvcc -O3 --shared utils.cpp -o utils.dll -I"%CUDA_PATH%/include" -I"%CUDA_PATH%/extras/CUPTI/include" -L"%CUDA_PATH%"/extras/CUPTI/lib64 -L"%CUDA_PATH%"/lib/x64 -lcuda -lcudart -lcupti -lnvperf_host -lnvperf_target -Xcompiler "/EHsc /GL /Gy /O2 /Zc:inline /fp:precise /D "_WINDLL" /Zc:forScope /Oi /MD" && del utils.lib utils.exp 

After running above commands, please check whether libutils.so on Linux or utils.dll on Windows is in pycupti/csrc directory.

Starting Trace Server

This service is required to collect traces from the agent.

docker run -d --name jaeger -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 -e COLLECTOR_OTLP_ENABLED=true -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778 -p 16686:16686 -p 4317:4317 -p 4318:4318 -p 14250:14250 -p 14268:14268 -p 14269:14269 -p 9411:9411 jaegertracing/all-in-one:1.44 

The trace server runs on http://localhost:16686

Preparing the Environment for PyTorch Agent

It is recommended to set up a virtual environment using Conda for the PyTorch agent. Below are the steps to prepare the environment, referring to the Dockerfile.standalone_cpu_pytorch2.0.1 and Dockerfile.standalone_gpu_pytorch2.0.1_cuda11.7:

Prerequisites

Ensure you have the following installed on your system:

  • Python >= 3.8
  • Conda (for virtual environment management)
  • CUDA (if using GPU)
  • cuDNN (if using GPU)

Steps for Setting Up the Environment

1. Create a Conda Environment

For CPU-only:

conda create -n pytorch_cpu_env python=3.8
conda activate pytorch_cpu_env

For GPU-enabled:

conda create -n pytorch_gpu_env python=3.8
conda activate pytorch_gpu_env

2. Install Required Python Packages

For CPU-only:

pip install --upgrade pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cpu
pip install transformers diffusers sentencepiece opentelemetry-api opentelemetry-sdk \
    opentelemetry-exporter-otlp-proto-grpc grpcio opentelemetry-exporter-otlp-proto-http \
    httpio aenum requests tqdm scipy chardet psycopg "psycopg[binary]" Pika opencv-contrib-python

For GPU-enabled:

pip install --upgrade pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install transformers diffusers sentencepiece opentelemetry-api opentelemetry-sdk \
    opentelemetry-exporter-otlp-proto-grpc grpcio opentelemetry-exporter-otlp-proto-http \
    httpio aenum requests tqdm scipy chardet psycopg "psycopg[binary]" Pika opencv-contrib-python

3. Clone the Repository

git clone https://github.com/xlab-ub/py-mlmodelscope.git
cd py-mlmodelscope

4. Run the PyTorch Agent

For CPU-only:

python run_mlmodelscope.py --standalone true --agent pytorch --architecture cpu

For GPU-enabled:

python run_mlmodelscope.py --standalone true --agent pytorch --architecture gpu

Preparing the Environment for TensorFlow Agent

It is recommended to set up a virtual environment using Conda for the TensorFlow agent. Below are the steps to prepare the environment, referring to the Dockerfile.standalone_cpu_tensorflow2.10.1 and Dockerfile.standalone_gpu_tensorflow2.10.1_cuda11.2:

Prerequisites

Ensure you have the following installed on your system:

  • Python >= 3.7
  • Conda (for virtual environment management)
  • CUDA (if using GPU)
  • cuDNN (if using GPU)

Steps for Setting Up the Environment

1. Create a Conda Environment

For CPU-only:

conda create -n tensorflow_cpu_env python=3.7
conda activate tensorflow_cpu_env

For GPU-enabled:

conda create -n tensorflow_gpu_env python=3.7
conda activate tensorflow_gpu_env

2. Install Required Python Packages

For CPU-only:

pip install --upgrade pip
pip install tensorflow-cpu==2.10.1 tensorflow-hub
pip install transformers sentencepiece opentelemetry-api opentelemetry-sdk \
    opentelemetry-exporter-otlp-proto-grpc grpcio opentelemetry-exporter-otlp-proto-http \
    httpio aenum requests tqdm scipy chardet psycopg "psycopg[binary]" Pika opencv-contrib-python \
    protobuf==3.20.* Pillow

For GPU-enabled:

pip install --upgrade pip
pip install tensorflow==2.10.1 tensorflow-hub
pip install transformers sentencepiece opentelemetry-api opentelemetry-sdk \
    opentelemetry-exporter-otlp-proto-grpc grpcio opentelemetry-exporter-otlp-proto-http \
    httpio aenum requests tqdm scipy chardet psycopg "psycopg[binary]" Pika opencv-contrib-python \
    protobuf==3.20.* Pillow

3. Clone the Repository

git clone https://github.com/xlab-ub/py-mlmodelscope.git
cd py-mlmodelscope

4. Run the TensorFlow Agent

For CPU-only:

python run_mlmodelscope.py --standalone true --agent tensorflow --architecture cpu --model_name resnet50_keras

For GPU-enabled:

python run_mlmodelscope.py --standalone true --agent tensorflow --architecture gpu --model_name resnet50_keras

Preparing the Environment for ONNXRuntime Agent

It is recommended to set up a virtual environment using Conda for the ONNXRuntime agent. Below are the steps to prepare the environment, referring to the Dockerfile.standalone_cpu_onnxruntime1.19.2 and Dockerfile.standalone_gpu_onnxruntime1.19.2_cuda11.8:

Prerequisites

Ensure you have the following installed on your system:

  • Python >= 3.8
  • Conda (for virtual environment management)
  • CUDA (if using GPU)
  • cuDNN (if using GPU)

Steps for Setting Up the Environment

1. Create a Conda Environment

For CPU-only:

conda create -n onnxruntime_cpu_env python=3.8
conda activate onnxruntime_cpu_env

For GPU-enabled:

conda create -n onnxruntime_gpu_env python=3.8
conda activate onnxruntime_gpu_env

2. Install Required Python Packages

For CPU-only:

pip install --upgrade pip
pip install onnxruntime==1.19.2 optimum[onnxruntime] onnxruntime-training
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc \
    grpcio opentelemetry-exporter-otlp-proto-http httpio aenum requests tqdm torchvision \
    scipy chardet psycopg "psycopg[binary]" Pika opencv-contrib-python

For GPU-enabled:

pip install --upgrade pip
pip install onnxruntime-gpu==1.19.2 optimum[onnxruntime-gpu] onnxruntime-training
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc \
    grpcio opentelemetry-exporter-otlp-proto-http httpio aenum requests tqdm torchvision \
    scipy chardet psycopg "psycopg[binary]" Pika opencv-contrib-python

3. Clone the Repository

git clone https://github.com/xlab-ub/py-mlmodelscope.git
cd py-mlmodelscope

4. Run the ONNXRuntime Agent

For CPU-only:

python run_mlmodelscope.py --standalone true --agent onnxruntime --architecture cpu

For GPU-enabled:

python run_mlmodelscope.py --standalone true --agent onnxruntime --architecture gpu

Preparing the Environment for MXNet Agent

It is recommended to set up a virtual environment using Conda for the MXNet agent. Below are the steps to prepare the environment, referring to the Dockerfile.standalone_cpu_mxnet1.9.1.

Prerequisites

Ensure you have the following installed on your system:

  • Python >= 3.8
  • Conda (for virtual environment management)

Steps for Setting Up the Environment

1. Create a Conda Environment

conda create -n mxnet_cpu_env python=3.8
conda activate mxnet_cpu_env

2. Install Required Python Packages

pip install --upgrade pip
pip install mxnet==1.9.1 numpy==1.23.1 torchvision==0.9.0
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc \
    grpcio opentelemetry-exporter-otlp-proto-http httpio aenum requests tqdm scipy \
    chardet psycopg "psycopg[binary]" Pika opencv-contrib-python

3. Clone the Repository

git clone https://github.com/xlab-ub/py-mlmodelscope.git
cd py-mlmodelscope

4. Run the MXNet Agent

python run_mlmodelscope.py --standalone true --agent mxnet --architecture cpu --model_name alexnet

Preparing the Environment for JAX Agent

It is recommended to set up a virtual environment using Conda for the JAX agent. Below are the steps to prepare the environment, referring to the Dockerfile.standalone_cpu_jax0.4.30.

Prerequisites

Ensure you have the following installed on your system:

  • Python >= 3.9
  • Conda (for virtual environment management)

Steps for Setting Up the Environment

1. Create a Conda Environment

conda create -n jax_cpu_env python=3.9
conda activate jax_cpu_env

2. Install Required Python Packages

pip install --upgrade pip
pip install jax==0.4.30 flax
pip install transformers sentencepiece opentelemetry-api opentelemetry-sdk \
    opentelemetry-exporter-otlp-proto-grpc grpcio opentelemetry-exporter-otlp-proto-http \
    httpio aenum requests tqdm scipy chardet psycopg "psycopg[binary]" Pika \
    opencv-contrib-python Pillow

3. Clone the Repository

git clone https://github.com/xlab-ub/py-mlmodelscope.git
cd py-mlmodelscope

4. Run the JAX Agent

python run_mlmodelscope.py --standalone true --agent jax --architecture cpu --model_name resnet_50

References

[1] c3sr, “GitHub - c3sr/mlmodelscope: MLModelScope is an open source, extensible, and customizable platform to facilitate evaluation and measurement of ML models within AI pipelines.,” GitHub.

[2] c3sr, “GitHub - c3sr/mlmodelscope-api: API to support MLModelscope frontend,” GitHub.

[3] c3sr, “GitHub - c3sr/go-pytorch,” GitHub, Oct. 25, 2021. https://github.com/c3sr/go-pytorch

[4] “PyTorch,” PyTorch. https://www.pytorch.org

[5] “OpenTelemetry,” OpenTelemetry. https://opentelemetry.io/

[6] “Jaeger: open source, end-to-end distributed tracing,” Jaeger: open source, end-to-end distributed tracing, May 30, 2022. https://www.jaegertracing.io/

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages