This paper presents two convolutional neural networks (CNN) and their training strategies for skin detection. The first CNN consists of 20 convolution layers with 3 x 3 filters which is a kind of VGG network, and the second is composed of 20 network-in-network (NiN) layers, which can be considered a modification of Inception structure. When training these networks for human skin detection, we consider patch-based and whole-image-based training. The first method focuses on local features such as skin color and texture, and the second on the human-related shape features as well as color and texture. Experiments show that the proposed CNNs yield better performance than the conventional methods and also than the existing deep-learning based method. Also, it is found that the NiN structure generally shows higher accuracy than the VGG-based structure. The experiments also show that the whole-image-based training that learns the shape features yields better accuracy than the patch-based learning that focuses on local color and texture only.
- Patch based VGG method
- Patch based NiN method
- Image based VGG method
- Image based NiN method
- Comparison of PR and ROC curves
- Visual comparison with outher methods on the Pratheepan dataset