Skip to content

Commit

Permalink
(I)Training documentation fixed. (#367)
Browse files Browse the repository at this point in the history
(1)sample training program on README.md.
(2)quick start type fixed.
(3)Kungfu installation instructions added.
(4)Setup information fixed.
  • Loading branch information
Gyx-One committed Jun 28, 2021
1 parent 61b6e50 commit 49d14e0
Show file tree
Hide file tree
Showing 5 changed files with 123 additions and 28 deletions.
73 changes: 72 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,78 @@ More information of the HyperPose Docker image can be found [here](https://hyper

### Python training library

To install the Python training library, you can follow the steps [here](https://hyperpose.readthedocs.io/en/latest/markdown/install/training.html).
We recommend to use [Anaconda](https://www.anaconda.com/products/individual) to create a virtual python environment for hyperpose python training library, so we can avoid the possible package conflicts and handle the *cudatoolkit* and *cudnn* library dependency.

All the following instructions have been tested on the environments below:<br>
* Ubuntu 18.04, Tesla V100-DGXStation, Nvidia Driver Version 440.33.01, CUDA Verison=10.2
* Ubuntu 18.04, Tesla V100-DGXStation, Nvidia Driver Version 410.79, CUDA Verison=10.0
* Ubuntu 18.04, TITAN RTX, Nvidia Driver Version 430.64, CUDA Version=10.1
* Ubuntu 18.04, TITAN Xp, Nvidia Driver Version 430.26, CUDA Version=10.2
* Ubuntu 16.04, RTX 2080Ti, Nvidia Driver Version 430.50, CUDA Version=10.1

With Anaconda installed, run the following command to configure the appropriate virtual environment:

```bash
# >>> create virtual environment (choose yes)
conda create -n hyperpose python=3.7
# >>> activate the virtual environment, start installation
conda activate hyperpose
# >>> install cudatoolkit and cudnn library using conda
conda install cudatoolkit=10.0.130
conda install cudnn=7.6.0
```

Then we need to using pip to install the python requirements according to the [requirements.txt](https://github.com/tensorlayer/hyperpose/blob/master/requirements.txt):

```bash
pip install -r requirements.txt
```

Now all the configuration is down, run the following command under the root directory of the repository to test whether hyperpose can be successfully import:

```bash
# >>> Check whether the GPU is avaliable.
python
>>> import tensorflow as tf
>>> import tensorlayer as tl
>>> tf.test.is_gpu_available()
# >>> if the output is True, we can then import and run hyperpose now
>>> from hyperpose import Config,Model,Dataset
```

Congratulations! we can use hyperpose to develop your pose estimation models now!

Hyperpose python training library provides APIs through *Config*, *Model* and *Dataset* modules.

We use the *Config* module to set up the configuration, and use *Model* and *Dataset* module to assemble the train or evaluation pipline, the sample code below shows how to use hyperpose to train a *LightweightOpenpose* model with *Vggtiny* network backbone:

```bash
# >>> import modules of hyperpose
from hyperpose import Config,Model,Dataset
# >>> set model name to distinguish models (neccesarry)
Config.set_model_name("My_lopps")
# >>> set model architecture (and set model backbone when in need)
Config.set_model_type(Config.MODEL.LightweightOpenpose)
Config.set_model_backbone(Config.BACKBONE.Vggtiny)
# >>> set dataset to use
Config.set_dataset_type(Config.DATA.MSCOCO)
# >>> set training type
Config.set_train_type(Config.TRAIN.Single_train)
# >>> configuration is done, get config object and assemble the system
config=Config.get_config()
model=Model.get_model(config)
dataset=Dataset.get_dataset(config)
train=Model.get_train(config)
# >>> train!
train(model,dataset)
```

We provide a sample training script with cli located at [train.py](https://github.com/tensorlayer/hyperpose/blob/master/train.py) which demonstrates the usage of hyperpose python training library, you can either directly use the script to train your model or use it as a template for further modification.

To evaluate a model using hyperpose is similiar to the training procedure, we also provide a sample evaluation script with cli located at [eval.py](https://github.com/tensorlayer/hyperpose/blob/master/eval.py) as an example and a template for modification.

More information of the Hyperpose training library APIs can be found [here](https://hyperpose.readthedocs.io/en/latest/markdown/quick_start/training.html)


## Documentation

Expand Down
8 changes: 7 additions & 1 deletion docs/markdown/install/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,9 @@ python
```

## Extra configuration for exporting model
The hypeprose python training library handles the whole pipelines for developing the pose estimation system, including training, evaluating and testing. Its goal is to produce a .npz file that contains the well-trained model weights. For the training platform, the enviroment configuration above is engough. However, most inference engine only accept .pb format or .onnx format model, such as [TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html). Thus, one need to convert the trained model loaded with .npz file weight to .pb format or .onnx format for further deployment, which need extra configuration below:<br>
The hypeprose python training library handles the whole pipelines for developing the pose estimation system, including training, evaluating and testing. Its goal is to produce a .npz file that contains the well-trained model weights.

For the training platform, the enviroment configuration above is engough. However, most inference engine only accept .pb format or .onnx format model, such as [TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html). Thus, one need to convert the trained model loaded with .npz file weight to .pb format or .onnx format for further deployment, which need extra configuration below:<br>

* (I)Convert to .pb format:<br>
To convert the model into .pb format, we use *@tf.function* to decorate the *infer* function of each model class, so we can use the *get_concrete_function* function from tensorflow to consctruct the frozen model computation graph and then save it in .pb format.
Expand All @@ -88,6 +90,10 @@ The hypeprose python training library handles the whole pipelines for developing
*graph_transform* is used to check the input and output node of the .pb file if one doesn't know. when convert .pb file into .onnx file using tf2onnx, one is required to provide the input node name and output node name of the computation graph stored in .pb file, so he may need to use *graph_transform* to inspect the .pn file to get node names.<br>
build graph_transforms according to [tensorflow tools](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#using-the-graph-transform-tool)

## Extra configuration for parallel training
The hyperpose python training library use the High performance distributed machine learning framework **Kungfu** for parallel training.

Thus to use the parallel training functionality of hyperpose, please install [Kungfu](https://github.com/lsds/KungFu) according to the official instructon it provides.



2 changes: 1 addition & 1 deletion docs/markdown/quick_start/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ The code for training as simple as following would work.
# >>> import modules of hyperpose
from hyperpose import Config,Model,Dataset
# >>> set model name is necessary to distinguish models (neccesarry)
Config.set_model_name(args.model_name)
Config.set_model_name("my_lopps")
# >>> set model architecture (and set model backbone when in need)
Config.set_model_type(Config.MODEL.LightweightOpenpose)
Config.set_model_backbone(Config.BACKBONE.Vggtiny)
Expand Down
64 changes: 41 additions & 23 deletions docs/markdown/tutorial/training.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,55 @@
# Tutorial for Training Library
Up to now, Hyperpose provides:
* 4 types of preset model architectures:
* Openpose
* LightweightOpenpose
* Poseproposal
* MobilenetThinOpenpose
* 7 types of common model backbone for backbone replacement:
* MobilenetV1, MobilenetV2
* Vggtiny, Vgg16, Vgg19
* Resnet18, Resnet50
> Openpose
> LightweightOpenpos
> Poseproposal
> MobilenetThinOpenpose
* 10 types of common model backbone for backbone replacement:
> MobilenetV1, MobilenetV2
> Vggtiny, Vgg16, Vgg19
> Resnet18, Resnet50
> Mobilenet variants(Dilated Mobilenet,MobilenetThin,MobilenetSmall, located in the preset model architectures)
* 2 types of popular dataset
* COCO
* MPII
> COCO
> MPII
* extensions
> user-defined dataset
> user-defined model architecture
> pre-processors and post-processors
## Integrated pipeline
Hyperpose extract similiar models into a model class. For now, there are two classes: Openpose classes and Poseproposal classes.
all model architecture can be devided into one of them.

For each model class, Hyperpose privide a integrated pipeline.

### Integrated train pipeline
The usage of integrated training procedure of Hyperpose can be devided into two parts:
setting configuration using APIs of *Config* module, and getting the configured system from the *Model* and *dataset* module.

* setting parts mainly concern: model_name, model_type, model_backbone, dataset_type and train_type
* *set_model_name* will determine what the path the model related file will be put to
* *set_model_type* will adopt the chosen preset model architecture
> *set_model_name* will determine what the path the model related file will be put to
> *set_model_type* will adopt the chosen preset model architecture<br>
(use enum value of enum class **Config.MODEL**)
* *set_model_backbone* will replace the backbone of chosen preset model architeture
> *set_model_backbone* will replace the backbone of chosen preset model architeture<br>
(use enum value of enum class **Config.BACKBONE**)
* *set_dataset_type* will change the dataset in the training pipeline
> *set_dataset_type* will change the dataset in the training pipeline<br>
(use enum value of enum class **Config.DATA**)
* *set_train_type* is to choose whether use single GPU for single training or multiple GPUs for parallel training
(use enum value of enum class **Config.TRAIN**)<br>
the conbination of different model architectures and model backbones will lead to huge difference of countructed model' computation
complexity (for example,Openpose architecture with default Vgg19 backbone is 200MB, while MobilenetThinOpenpose with mobilenet-variant backbone is only 18MB), thus it should be carefully considered.
for more detailed information, please refer the API documents.
> *set_train_type* is to choose whether use single GPU for single training or multiple GPUs for parallel training<br>
(use enum value of enum class **Config.TRAIN**)

The conbination of different model architectures and model backbones will lead to huge difference of countructed model's computation
complexity.

For example,Openpose architecture with default Vgg19 backbone is 200MB, while MobilenetThinOpenpose with mobilenet-variant backbone is only 18MB.

Thus the available configuraions could cover a great range of possible hardware computation resources at hand.

for more detailed information, please refer the API documents.

The basic training pipeline configuration is below:

```bash
# >>> import modules of hyperpose
from hyperpose import Config,Model,Dataset
Expand All @@ -51,20 +66,23 @@ Config.set_train_type(Config.TRAIN.Single_train)
# >>> congratulations!, the simplest configuration is done, it's time to assemble the model and training pipeline
```
to use parallel training, one should set train type at first, and then choose kungfu optimizor wrap function, replace the set_train_type function as below, Kungfu also have three option: Sync_sgd,Sync_avg,Pair_avg

```bash
Config.set_train_type(Config.TRAIN.Parallel_train)
Config.set_kungfu_option(Config.KUNGFU.Sync_sgd)
```

And when run your program, using the following command(assuming we have 4 GPUs)

```bash
CUDA_VISIBLE_DEVICES=0,1,2,3 kungfu-run -np 4 python train.py
```

* getting parts mainly concern: pass the configuration to *Model* module and *Dataset* module to assemble the system
* *Config.get_config* will return a config object which contains all the configuration and is the core of the getting functions
* *Model.get_model* will return a configrued model object which can forward and calcaulate loss
* *Datset.get_dataset* will return a configured dataset object which can generate tensorflow dataset object used for train and evaluate, it can also visualize the dataset annotation.
* *Model.get_train* will return a training pipeline, which could start running as long as receive the model object and dataset object
> *Config.get_config* will return a config object which contains all the configuration and is the core of the getting functions
> *Model.get_model* will return a configrued model object which can forward and calcaulate loss
> *Datset.get_dataset* will return a configured dataset object which can generate tensorflow dataset object used for train and evaluate, it can also visualize the dataset annotation.
> *Model.get_train* will return a training pipeline, which could start running as long as receive the model object and dataset object
The basic training pipeline assembling is below:
```bash
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@
"pycocotools"
],
#meta data
author="Hyperpose community",
author_email="1137743903@qq.com",
author="TensorLayer Community",
author_email="tensorlayer@gmail.com",
description="HyperPose is a library for building human pose estimation systems that can efficiently operate in the wild.",
long_description=long_description,
long_description_content_type="text/markdown",
Expand Down

0 comments on commit 49d14e0

Please sign in to comment.