Skip to content
/ visiondk Public

A powerful baseline for image classification and face recognition with Pytorch

License

Notifications You must be signed in to change notification settings

wuji3/visiondk

Repository files navigation

VisionDK: ToolBox Of Image Classification & Face Recognition

Tutorials

Install ☘️
# It is recommanded to create a separate virtual environment
conda create -n vision python=3.10 
conda activate vision

# torch==2.0.1(lower is also ok) -> https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio cpuonly -c pytorch # cpu-version
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia  # cuda-version

pip install -r requirements.txt

# Without Arial.ttf, inference may be slow due to network IO.
mkdir -p ~/.config/DuKe
cp misc/Arial.ttf ~/.config/DuKe
Training 🌟️
# one machine one gpu
python main.py --cfgs configs/task/pet.yaml

# one machine multiple gpus
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node 4 main.py --cfgs configs/classification/pet.yaml
                                                                 --sync_bn[Option: this will lead to training slowly]
                                                                 --resume[Option: training from checkpoint]
                                                                 --load_from[Option: training from fine-tuning]

What's New

  • [Apr. 2024] Face Recognition Task(FRT) is supported now 🚀️️! We provide ResNet, EfficientNet, and Swin Transformer as backbone; As for head, ArcFace, CircleLoss, MegFace and MV Softmax could be used for training. Note: partial implementation refers to JD-FaceX
  • [Jun. 2023] Image Classification Task(ICT) has launched 🚀️️! Supporting many powerful strategies, such as progressive learning, online enhancement, beautiful training interface, exponential moving average, etc. The models are fully integrated into torchvision.
  • [May. 2023] The first initialization version of Vision.

Which's task

  1. Face Recognition Task(FRT)
  2. Image Classification Task(ICT)

Implemented Method & Paper

Method Paper
SAM Sharpness-Aware Minimization for Efficiently Improving Generalization
Progressive Learning EfficientNetV2: Smaller Models and Faster Training
OHEM Training Region-based Object Detectors with Online Hard Example Mining
Focal Loss Focal Loss for Dense Object Detection
Cosine Annealing SGDR: Stochastic Gradient Descent with Warm Restarts
Label Smoothing Rethinking the Inception Architecture for Computer Vision
Mixup MixUp: Beyond Empirical Risk Minimization
CutOut Improved Regularization of Convolutional Neural Networks with Cutout
Attention Pool Augmenting Convolutional networks with attention-based aggregation
GradCAM Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
ArcFace ArcFace: Additive Angular Margin Loss for Deep Face Recognition
CircleLoss Circle Loss: A Unified Perspective of Pair Similarity Optimization
MegFace MagFace: A Universal Representation for Face Recognition and Quality Assessment
MV Softmax Mis-classified Vector Guided Softmax Loss for Face Recognition

Model & Paper

Method Paper Name in configs, eg: torchvision-mobilenet_v2
MobileNetv2 MobileNetV2: Inverted Residuals and Linear Bottlenecks mobilenet_v2
MobileNetv3 Searching for MobileNetV3 mobilenet_v3_small, mobilenet_v3_large
ShuffleNetv2 ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design shufflenet_v2_x0_5, shufflenet_v2_x1_0, shufflenet_v2_x1_5, shufflenet_v2_x2_0
ResNet Deep Residual Learning for Image Recognition resnet18, resnet34, resnet50, resnet101, resnet152
ResNeXt Aggregated Residual Transformations for Deep Neural Networks resnext50_32x4d, resnext101_32x8d, resnext101_64x4d
ConvNext A ConvNet for the 2020s convnext_tiny, convnext_small, convnext_base, convnext_large
EfficientNet EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks efficientnet_b{0..7}
EfficientNetv2 EfficientNetV2: Smaller Models and Faster Training efficientnet_v2_s, efficientnet_v2_m, efficientnet_v2_l
Swin Transformer Swin Transformer: Hierarchical Vision Transformer using Shifted Windows swin_t, swin_s, swin_b
Swin Transformerv2 Swin Transformer V2: Scaling Up Capacity and Resolution swin_v2_t, swin_v2_s, swin_v2_b

Tools

  1. Split the data set into training set and validation set
python tools/data_prepare.py --postfix <jpg or png> --root <input your data realpath> --frac <train segment ratio, eg: 0.9 0.6 0.3 0.9 0.9>
  1. Data augmented visualization
cd visiondk
python -m tools.test_augment

Contact Me

  1. If you enjoy reproducing papers and algorithms, welcome to pull request.
  2. If you have some confusion about the repo, please submit issues.

Releases

No releases published

Packages

 
 
 

Languages