Skip to content

Commit

Permalink
refact: refine yixiao's doc [training.install]
Browse files Browse the repository at this point in the history
  • Loading branch information
ganler committed Jun 30, 2021
1 parent 91d23f2 commit 1a98115
Show file tree
Hide file tree
Showing 2 changed files with 113 additions and 71 deletions.
2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@
'numpydoc',
]

myst_enable_extensions = ["colon_fence"]

autodoc_mock_imports = [
'gridfs',
'horovod',
Expand Down
182 changes: 111 additions & 71 deletions docs/markdown/install/training.md
Original file line number Diff line number Diff line change
@@ -1,116 +1,156 @@
# Python Training Library Installation

## Prerequisites
* [Anaconda3](https://www.anaconda.com/products/individual):<br>
Anaconda is used to create virtual environment that facilitates building the running environment and ease the complexity of library depedencies. Here we mainly use it to create virtual python environment and install cuda run-time libraries.
* [CUDA](https://developer.nvidia.com/cuda-downloads):<br>
CUDA enviroment is essential to run deep learning neural networks on GPUs. The CUDA installation packages to download should match your system and your NVIDIA Driver version.
## Configure CUDA environment

## Configure environment
&emsp;There are two ways to install hyperpose python training library.
You can configure your CUDA either by Anaconda or your system setting.

&emsp;All the following instructions have been tested on the environments below:<br>
| OS | NVIDIA Driver | CUDA Toolkit | GPU |
| ------------ | ------------- | ------------ | -------------- |
| Ubuntu 18.04 | 410.79 | 10.0 | Tesla V100-DGX |
| Ubuntu 18.04 | 440.33.01 | 10.2 | Tesla V100-DGX |
| Ubuntu 18.04 | 430.64 | 10.1 | TITAN RTX |
| Ubuntu 18.04 | 430.26 | 10.2 | TITAN XP |
| Ubuntu 16.04 | 430.50 | 10.1 | RTX 2080Ti |
### Using CUDA toolkits from Anaconda (RECOMMENDED)

:::{admonition} Prerequisites
- [Anaconda3](https://www.anaconda.com/products/individual)
- [NVidia Driver >= 410.79 (required by CUDA 10)](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#driver-installation)
:::

It is suggested to create new conda environment regarding the CUDA requirements.

&emsp;Before all, we recommend you to create anaconda virtual environment first, which could handle the possible conflicts between the libraries you already have in your computers and the libraries hyperpose need to install, and also handle the dependencies of the cudatoolkit and cudnn library in a very simple way.<br>
&emsp;To create the virtual environment, run the following command in bash:
```bash
# >>> create virtual environment (choose yes)
conda create -n hyperpose python=3.7
# >>> create virtual environment
conda create -n hyperpose python=3.7 -y
# >>> activate the virtual environment, start installation
conda activate hyperpose
# >>> install cudatoolkit and cudnn library using conda
conda install cudatoolkit=10.0.130
conda install cudnn=7.6.0
```

::::{warning}
It is also possible to install CUDA dependencies without creating a new environment.
But it might introduce environment conflicts.

&emsp;After configuring and activating conda enviroment, we can then begin to install the hyperpose.<br>
:::{code-block} bash
conda install cudatoolkit=10.0.130
conda install cudnn=7.6.0
:::
::::

### (I)The first method to install is to put hyperpose python module in the working directory.(recommand)<br>
&emsp;After git-cloning the source [repository](https://github.com/tensorlayer/hyperpose.git), you can directly import hyperpose python library under the root directory of the cloned repository.<br>

&emsp;To make importion available, you should install the prerequist dependencies as followed:<br>
&emsp;you can either install according to the requirements.txt in the [repository](https://github.com/tensorlayer/hyperpose.git)
### Using system-wise CUDA toolkits

Users may also directly depend on the system-wise CUDA and CuDNN libraries.

HyperPose have been tested on the environments below:

| OS | NVIDIA Driver | CUDA Toolkit | GPU |
| ------------ | ------------- | ------------ | -------------- |
| Ubuntu 18.04 | 410.79 | 10.0 | Tesla V100-DGX |
| Ubuntu 18.04 | 440.33.01 | 10.2 | Tesla V100-DGX |
| Ubuntu 18.04 | 430.64 | 10.1 | TITAN RTX |
| Ubuntu 18.04 | 430.26 | 10.2 | TITAN XP |
| Ubuntu 16.04 | 430.50 | 10.1 | RTX 2080Ti |

::::{admonition} Check CUDA/CuDNN versions

To test CUDA version, run `nvcc --version`: the highlight line in the output indicates that you have CUDA 11.2 installed.
:::{code-block} bash
:emphasize-lines: 5
nvcc --version
# ========== Valid output looks like ==========
# nvcc: NVIDIA (R) Cuda compiler driver
# Copyright (c) 2005-2020 NVIDIA Corporation
# Built on Mon_Nov_30_19:08:53_PST_2020
# Cuda compilation tools, release 11.2, V11.2.67
# Build cuda_11.2.r11.2/compiler.29373293_0
:::

To check your system-wise CuDNN version **on Linux**: the output (in the comment) shows that we have CuDNN 8.0.5.
:::{code-block} bash
ls /usr/local/cuda/lib64 | grep libcudnn.so
# === Valid output looks like ===
# libcudnn.so
# libcudnn.so.8
# libcudnn.so.8.0.5
:::
::::

## Install HyperPose Python training library

### Install with `pip`

To install a stable library from [Python Package Index](https://github.com/tensorlayer/hyperpose):

```bash
# install according to the requirements.txt
pip install -r requirements.txt
pip install hyperpose
```

&emsp;or install libraries one by one
Or you can install a specific release of hyperpose from GitHub, for example:

```bash
# >>> install tensorflow of version 2.3.1
pip install tensorflow-gpu==2.3.1
# >>> install tensorlayer of version 2.2.3
pip install tensorlayer==2.2.3
# >>> install other requirements (numpy<=17.0.0 because it has conflicts with pycocotools)
pip install opencv-python
pip install numpy==1.16.4
pip install pycocotools
pip install matplotlib
export HYPERPOSE_VERSION="2.2.0-alpha"
pip install https://github.com/tensorlayer/hyperpose/archive/${HYPERPOSE_VERSION}.zip
```

&emsp;This method of installation use the latest source code and thus is less likely to meet compatibility problems.<br><br>
More GitHub releases and its version can be found [here](https://github.com/tensorlayer/hyperpose/releases).

### (II)The second method to install is to use pypi repositories.<br>
&emsp;We have already upload hyperpose python library to pypi website so you can install it using pip, which gives you the last stable version.
### Local installation

You can also install HyperPose by installing the raw GitHub repository, this is usually for developers.

```bash
pip install hyperpose
# Install the source codes from GitHub
git clone https://github.com/tensorlayer/hyperpose.git
pip install -r hyperpose/requirements.txt

# Add `hyperpose/hyperpose` to `PYTHONPATH` to help python find it.
export HYPERPOSE_PYTHON_HOME=$(pwd)/hyperpose
export PYTHONPATH=$HYPERPOSE_PYTHON_HOME/python:${PYTHONPATH}
```

&emsp;This will download and install all dependencies automatically.
## Check the installation

Let's check whether HyperPose is successfully installed by running following commands:

&emsp;Now after installing dependent libraries and hyperpose itself, let's check whether the installation successes.
run following command in bash:
```bash
# >>> now the configuration is done, check whether the GPU is avaliable.
python
>>> import tensorflow as tf
>>> import tensorlayer as tl
>>> tf.test.is_gpu_available()
# >>> if the output is True, congratulation! you can import and run hyperpose now
>>> from hyperpose import Config,Model,Dataset
python -c '
import tensorflow as tf # Test TensorLayer installation
import tensorlayer as tl # Test TensorLayer installation
assert tf.test.is_gpu_available() # Test GPU existence
import hyperpose # Test HyperPose import
'
```

## Extra configuration for exporting model
&emsp;The hypeprose python training library handles the whole pipelines for developing the pose estimation system, including training, evaluating and testing. Its goal is to produce a **.npz** file that contains the well-trained model weights.
## Optional Setup

&emsp;For the training platform, the enviroment configuration above is engough. However, most inference engine only accept .pb format or .onnx format model, such as [TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html).
### Extra configuration for model exportation

&emsp;Thus, one need to convert the trained model loaded with **.npz** file weight to **.pb** format or **.onnx** format for further deployment, which need extra configuration below:<br>
The hypeprose python training library handles the whole pipelines for developing the pose estimation system, including training, evaluating and testing. Its goal is to produce a **.npz** file that contains the well-trained model weights.

### (I)Convert to .pb format:<br>
&emsp;To convert the model into .pb format, we use *@tf.function* to decorate the *infer* function of each model class, so we can use the *get_concrete_function* function from tensorflow to consctruct the frozen model computation graph and then save it in .pb format.
For the training platform, the enviroment configuration above is engough. However, most inference engine accepts `.pb` or [`.onnx`] format model. For example, the HyperPose C++ inference engine leverages [TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html) as the DNN engine, which takes `.onnx` models as inputs.

&emsp;We already provide a script with cli to facilitate conversion, which located at [export_pb.py](https://github.com/tensorlayer/hyperpose/blob/master/export_pb.py). What we need here is only *tensorflow* library that we already installed.
Thus, one need to convert the trained model loaded with **.npz** file weight to **.pb** format or **.onnx** format for further deployment, which need extra configuration below:

### (II)Convert to .onnx format:<br>
&emsp;To convert the model in .onnx format, we need to first convert the model into .pb format, then convert it from .pb format into .onnx format. Two extra library are needed:
#### Converting a `.pb` model

* [tf2onnx](https://github.com/onnx/tensorflow-onnx):<br>
*tf2onnx* is used to convert .pb format model into .onnx format model. more information see [here](https://github.com/onnx/tensorflow-onnx).<br>
install tf2onnx by running:
To convert the model into `.pb` format, we use `@tf.function` to decorate the `infer` function for each model class, and we then can use the `get_concrete_function` function from tensorflow to consctruct the frozen model computation graph and then save it with `.pb` format.

```bash
pip install -U tf2onnx
```
We provide [a commandline tool](https://github.com/tensorlayer/hyperpose/blob/master/export_pb.py) to facilitate the conversion. The prerequisite of this tool is a tensorflow library installed along with HyperPose's dependency.

* [graph_transforms](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#using-the-graph-transform-tool):<br>
*graph_transform* is used to check the input and output node of the .pb file if one doesn't know. when convert .pb file into .onnx file using tf2onnx, one is required to provide the input node name and output node name of the computation graph stored in .pb file, so he may need to use *graph_transform* to inspect the .pb file to get node names.<br>
build graph_transforms according to [tensorflow tools](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#using-the-graph-transform-tool).
#### Converting a `.onnx` model

To convert a trained model into `.onnx` format, we need to first convert the model into `.pb` format, we then convert a `.pb` model into `.onnx` format, which requires 2 additional libraries:

## Extra configuration for parallel training
&emsp;The hyperpose python training library use the High performance distributed machine learning framework **Kungfu** for parallel training.<br>
&emsp;Thus to use the parallel training functionality of hyperpose, please install [Kungfu](https://github.com/lsds/KungFu) according to the official instructon it provides.
* [**tf2onnx**](https://github.com/onnx/tensorflow-onnx) for converting TensorFlow's `.pb` model into `.onnx` format.
* [**graph_transforms**](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#using-the-graph-transform-tool)

To install `tf2onnx`, we simply run:

```bash
pip install -U tf2onnx
```

After converting a `.pb` file to an `.onnx` file using tf2onnx, it is usually required to provide the input node name and output node name of the computation graph stored in `.pb` file, which is often tedious. Instead, we use `graph_transform` to finding out the input and output node of the `.pb` model file automatically.

build graph_transforms according to [tensorflow tools](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#using-the-graph-transform-tool).

### Extra configuration for distributed training with KungFu

The HyperPose python training library can also perform distributed training with [Kungfu](https://github.com/lsds/KungFu). To enable parallel training, please install [Kungfu](https://github.com/lsds/KungFu) according to its official instructon.

0 comments on commit 1a98115

Please sign in to comment.