Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PR] Support face-parsing semantic segmentation model #14

Merged
merged 3 commits into from Mar 17, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Expand Up @@ -68,3 +68,5 @@ fastlane/screenshots/**/*.png
fastlane/test_output

.DS_Store

*.mlmodel
52 changes: 41 additions & 11 deletions README.md
Expand Up @@ -6,9 +6,9 @@

This project is Object Segmentation on iOS with Core ML.<br>If you are interested in iOS + Machine Learning, visit [here](https://github.com/motlabs/iOS-Proejcts-with-ML-Models) you can see various DEMOs.<br>

| DEMO | Screenshot 1 | Screenshot 2 | Screenshot 3 |
| ------------------------------------------------------------ | --------------------------------------------- | --------------------------------------------- | --------------------------------------------- |
| <img src="https://user-images.githubusercontent.com/37643248/99242802-167ad280-2843-11eb-959a-5fe3b169d8f0.gif" width=240px> | <img src="resource/IMG_3633.PNG" width=240px> | <img src="resource/IMG_3632.PNG" width=240px> | <img src="resource/IMG_3635.PNG" width=240px> |
| DeepLabV3-DEMO1 | FaseParsing-DEMO | DeepLabV3-DEMO-2 | DeepLabV3-DEMO-3 |
| ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------------- | --------------------------------------------- |
| <img src="https://user-images.githubusercontent.com/37643248/99242802-167ad280-2843-11eb-959a-5fe3b169d8f0.gif" width=240px> | <img src="https://user-images.githubusercontent.com/37643248/110972921-e8943d80-839f-11eb-9559-2a32d3b56de0.gif" width=240px> | <img src="resource/IMG_3633.PNG" width=240px> | <img src="resource/IMG_3635.PNG" width=240px> |

## How it works

Expand All @@ -22,21 +22,22 @@ This project is Object Segmentation on iOS with Core ML.<br>If you are intereste
- iOS 12.0+
- Swift 5

## Model
## Models

### Download

Download model from [apple's model page](https://developer.apple.com/machine-learning/models/).

### Matadata

| | input node | output node | size |
| :--------------: | :---------------------------------: | :-----------------------------------------: | :----: |
| DeepLabV3 | `[1, 513, 513, 3]`<br>name: `image` | `[513, 513]`<br>name: `semanticPredictions` | 8.6 MB |
| DeepLabV3FP16 | `[1, 513, 513, 3]`<br>name: `image` | `[513, 513]`<br>name: `semanticPredictions` | 4.3 MB |
| DeepLabV3Int8LUT | `[1, 513, 513, 3]`<br>name: `image` | `[513, 513]`<br>name: `semanticPredictions` | 2.3 MB |
| Name | Input | Output | Size | iOS version+ | Download |
| :--------------- | :-----------------------: | :----------------------------: | :-----: | :----------: | :----------------------------------------------------------: |
| DeepLabV3 | `Image (Color 513 × 513)` | `MultiArray (Int32 513 × 513)` | 8.6 MB | iOS 12.0+ | [link](https://developer.apple.com/machine-learning/models/) |
| DeepLabV3FP16 | `Image (Color 513 × 513)` | `MultiArray (Int32 513 × 513)` | 4.3 MB | iOS 12.0+ | [link](https://developer.apple.com/machine-learning/models/) |
| DeepLabV3Int8LUT | `Image (Color 513 × 513)` | `MultiArray (Int32 513 × 513)` | 2.3 MB | iOS 12.0+ | [link](https://developer.apple.com/machine-learning/models/) |
| FaceParsing | `Image (Color 512 × 512)` | `MultiArray (Int32)` 512 × 512 | 52.7 MB | iOS 14.0+ | [link](https://github.com/tucan9389/SemanticSegmentation-CoreML/releases/download/support-face-parsing/FaceParsing.mlmodel) |

### Inference Time
### Inference Time − DeepLabV3

| Device | Inference Time | Total Time (GPU) | Total Time (CPU) |
| ----------------- | :------------: | :--------------: | :--------------: |
Expand All @@ -60,9 +61,38 @@ Download model from [apple's model page](https://developer.apple.com/machine-lea

⏲: need to measure

### Inference Time − FaceParsing

| Device | Inference Time | Total Time (GPU) | Total Time (CPU) |
| ------------- | :------------: | :--------------: | :--------------: |
| iPhone 12 Pro | ⏲ | ⏲ | ⏲ |
| iPhone 11 Pro | 37 ms | 37 ms | ⏲ |

### Labels − DeepLabV3

```
# total 21
["background", "aeroplane", "bicycle", "bird", "boat",
"bottle", "bus", "car", "cat", "chair",
"cow", "diningtable", "dog", "horse", "motorbike",
"person", "pottedplant", "sheep", "sofa", "train",
"tv"]
```

### Labels − FaceParsing

```
# total 19
["background", "skin", "l_brow", "r_brow", "l_eye",
"r_eye", "eye_g", "l_ear", "r_ear", "ear_r",
"nose", "mouth", "u_lip", "l_lip", "neck",
"neck_l", "cloth", "hair", "hat"]
```

## See also

- [motlabs/iOS-Proejcts-with-ML-Models](https://github.com/motlabs/iOS-Proejcts-with-ML-Models)<br>
: The challenge using machine learning model created from tensorflow on iOS
- [deeplab on TensorFlow](https://github.com/tensorflow/models/tree/master/research/deeplab)<br>
- [DeepLab on TensorFlow](https://github.com/tensorflow/models/tree/master/research/deeplab)<br>
: The repository providing DeepLabV3 model
- [FaceParsing](https://github.com/zllrunning/face-parsing.PyTorch)<Br>: The repository providing the FaceParsing pytorch model
18 changes: 16 additions & 2 deletions SemanticSegmentation-CoreML.xcodeproj/project.pbxproj
Expand Up @@ -20,6 +20,7 @@
71BBE06222E3400E00E74F11 /* VideoCapture.swift in Sources */ = {isa = PBXBuildFile; fileRef = 71BBE06022E3400E00E74F11 /* VideoCapture.swift */; };
71BBE06322E3400E00E74F11 /* Measure.swift in Sources */ = {isa = PBXBuildFile; fileRef = 71BBE06122E3400E00E74F11 /* Measure.swift */; };
71BBE06722E3446300E74F11 /* SegmentationResultMLMultiArray.swift in Sources */ = {isa = PBXBuildFile; fileRef = 71BBE06622E3446300E74F11 /* SegmentationResultMLMultiArray.swift */; };
C4052DC025EFE8960040F98D /* MaskTextureGenerater.swift in Sources */ = {isa = PBXBuildFile; fileRef = C4052DBF25EFE8960040F98D /* MaskTextureGenerater.swift */; };
C4BB0D92256195AE00354C08 /* MetalRenderingDevice.swift in Sources */ = {isa = PBXBuildFile; fileRef = C4BB0D91256195AE00354C08 /* MetalRenderingDevice.swift */; };
C4BB0D96256195F800354C08 /* Maths.swift in Sources */ = {isa = PBXBuildFile; fileRef = C4BB0D95256195F800354C08 /* Maths.swift */; };
C4BB0D99256196A300354C08 /* CameraTextureGenerater.swift in Sources */ = {isa = PBXBuildFile; fileRef = C4BB0D98256196A300354C08 /* CameraTextureGenerater.swift */; };
Expand All @@ -29,6 +30,8 @@
C4BB0DA625619AA400354C08 /* MetalVideoView.swift in Sources */ = {isa = PBXBuildFile; fileRef = C4BB0DA525619AA400354C08 /* MetalVideoView.swift */; };
C4BB0DA925619C0400354C08 /* LiveMetalCameraViewController.swift in Sources */ = {isa = PBXBuildFile; fileRef = C4BB0DA825619C0400354C08 /* LiveMetalCameraViewController.swift */; };
C4BB0DB52561A47900354C08 /* Shaders.metal in Sources */ = {isa = PBXBuildFile; fileRef = C4BB0DB42561A47900354C08 /* Shaders.metal */; };
C4DDEF4925FB779D000CF6A5 /* MultitargetSegmentationTextureGenerater.swift in Sources */ = {isa = PBXBuildFile; fileRef = C4DDEF4825FB779D000CF6A5 /* MultitargetSegmentationTextureGenerater.swift */; };
C4DDEF7525FBCBC8000CF6A5 /* FaceParsing.mlmodel in Sources */ = {isa = PBXBuildFile; fileRef = C4DDEF7425FBCBC8000CF6A5 /* FaceParsing.mlmodel */; };
/* End PBXBuildFile section */

/* Begin PBXFileReference section */
Expand All @@ -47,6 +50,7 @@
71BBE06022E3400E00E74F11 /* VideoCapture.swift */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.swift; path = VideoCapture.swift; sourceTree = "<group>"; };
71BBE06122E3400E00E74F11 /* Measure.swift */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.swift; path = Measure.swift; sourceTree = "<group>"; };
71BBE06622E3446300E74F11 /* SegmentationResultMLMultiArray.swift */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.swift; path = SegmentationResultMLMultiArray.swift; sourceTree = "<group>"; };
C4052DBF25EFE8960040F98D /* MaskTextureGenerater.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = MaskTextureGenerater.swift; sourceTree = "<group>"; };
C4BB0D91256195AE00354C08 /* MetalRenderingDevice.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = MetalRenderingDevice.swift; sourceTree = "<group>"; };
C4BB0D95256195F800354C08 /* Maths.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = Maths.swift; sourceTree = "<group>"; };
C4BB0D98256196A300354C08 /* CameraTextureGenerater.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = CameraTextureGenerater.swift; sourceTree = "<group>"; };
Expand All @@ -56,6 +60,8 @@
C4BB0DA525619AA400354C08 /* MetalVideoView.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = MetalVideoView.swift; sourceTree = "<group>"; };
C4BB0DA825619C0400354C08 /* LiveMetalCameraViewController.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = LiveMetalCameraViewController.swift; sourceTree = "<group>"; };
C4BB0DB42561A47900354C08 /* Shaders.metal */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.metal; path = Shaders.metal; sourceTree = "<group>"; };
C4DDEF4825FB779D000CF6A5 /* MultitargetSegmentationTextureGenerater.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = MultitargetSegmentationTextureGenerater.swift; sourceTree = "<group>"; };
C4DDEF7425FBCBC8000CF6A5 /* FaceParsing.mlmodel */ = {isa = PBXFileReference; lastKnownFileType = file.mlmodel; path = FaceParsing.mlmodel; sourceTree = "<group>"; };
/* End PBXFileReference section */

/* Begin PBXFrameworksBuildPhase section */
Expand Down Expand Up @@ -112,6 +118,7 @@
71BBE05522E33BEB00E74F11 /* DeepLabV3.mlmodel */,
71BBE05722E33BEF00E74F11 /* DeepLabV3FP16.mlmodel */,
71BBE05922E33BF300E74F11 /* DeepLabV3Int8LUT.mlmodel */,
C4DDEF7425FBCBC8000CF6A5 /* FaceParsing.mlmodel */,
);
path = mlmodel;
sourceTree = "<group>";
Expand All @@ -123,7 +130,9 @@
C4BB0DA525619AA400354C08 /* MetalVideoView.swift */,
C4BB0D98256196A300354C08 /* CameraTextureGenerater.swift */,
C4BB0D9F2561983C00354C08 /* SegmentationTextureGenerater.swift */,
C4DDEF4825FB779D000CF6A5 /* MultitargetSegmentationTextureGenerater.swift */,
C4BB0DA2256199B200354C08 /* OverlayingTexturesGenerater.swift */,
C4052DBF25EFE8960040F98D /* MaskTextureGenerater.swift */,
C4BB0D9B256196ED00354C08 /* Texture.swift */,
C4BB0DB32561A46B00354C08 /* Shaders */,
C4BB0D94256195E800354C08 /* Utils */,
Expand Down Expand Up @@ -225,11 +234,14 @@
71BBE06322E3400E00E74F11 /* Measure.swift in Sources */,
C4BB0DA625619AA400354C08 /* MetalVideoView.swift in Sources */,
71BBE05822E33BEF00E74F11 /* DeepLabV3FP16.mlmodel in Sources */,
C4DDEF4925FB779D000CF6A5 /* MultitargetSegmentationTextureGenerater.swift in Sources */,
71BBE06722E3446300E74F11 /* SegmentationResultMLMultiArray.swift in Sources */,
C4BB0DA925619C0400354C08 /* LiveMetalCameraViewController.swift in Sources */,
C4BB0D9C256196ED00354C08 /* Texture.swift in Sources */,
71BBE05622E33BEB00E74F11 /* DeepLabV3.mlmodel in Sources */,
C4052DC025EFE8960040F98D /* MaskTextureGenerater.swift in Sources */,
71BBE05C22E33C6C00E74F11 /* StillImageViewController.swift in Sources */,
C4DDEF7525FBCBC8000CF6A5 /* FaceParsing.mlmodel in Sources */,
C4BB0D96256195F800354C08 /* Maths.swift in Sources */,
C4BB0D92256195AE00354C08 /* MetalRenderingDevice.swift in Sources */,
71BBE04622E33B2500E74F11 /* LiveImageViewController.swift in Sources */,
Expand Down Expand Up @@ -384,14 +396,16 @@
ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon;
CODE_SIGN_STYLE = Automatic;
DEVELOPMENT_TEAM = 5WXJ4Z4H69;
GCC_OPTIMIZATION_LEVEL = s;
INFOPLIST_FILE = "SemanticSegmentation-CoreML/Info.plist";
IPHONEOS_DEPLOYMENT_TARGET = 12.0;
IPHONEOS_DEPLOYMENT_TARGET = 12.1;
LD_RUNPATH_SEARCH_PATHS = (
"$(inherited)",
"@executable_path/Frameworks",
);
PRODUCT_BUNDLE_IDENTIFIER = "com.tucan9389.SemanticSegmentation-CoreML";
PRODUCT_NAME = "$(TARGET_NAME)";
SWIFT_OPTIMIZATION_LEVEL = "-O";
SWIFT_VERSION = 5.0;
TARGETED_DEVICE_FAMILY = "1,2";
};
Expand All @@ -404,7 +418,7 @@
CODE_SIGN_STYLE = Automatic;
DEVELOPMENT_TEAM = 5WXJ4Z4H69;
INFOPLIST_FILE = "SemanticSegmentation-CoreML/Info.plist";
IPHONEOS_DEPLOYMENT_TARGET = 12.0;
IPHONEOS_DEPLOYMENT_TARGET = 12.1;
LD_RUNPATH_SEARCH_PATHS = (
"$(inherited)",
"@executable_path/Frameworks",
Expand Down
5 changes: 4 additions & 1 deletion SemanticSegmentation-CoreML/LiveImageViewController.swift
Expand Up @@ -24,7 +24,10 @@ class LiveImageViewController: UIViewController {

// MARK - Core ML model
// DeepLabV3(iOS12+), DeepLabV3FP16(iOS12+), DeepLabV3Int8LUT(iOS12+)
let segmentationModel = DeepLabV3Int8LUT()
// FaceParsing(iOS14+)
lazy var segmentationModel = {
return try! DeepLabV3()
}()

// 11 Pro
// DeepLabV3 : 37 465 1
Expand Down
17 changes: 10 additions & 7 deletions SemanticSegmentation-CoreML/LiveMetalCameraViewController.swift
Expand Up @@ -20,7 +20,7 @@ class LiveMetalCameraViewController: UIViewController {
@IBOutlet weak var fpsLabel: UILabel!

var cameraTextureGenerater = CameraTextureGenerater()
var segmentationTextureGenerater = SegmentationTextureGenerater()
var multitargetSegmentationTextureGenerater = MultitargetSegmentationTextureGenerater()
var overlayingTexturesGenerater = OverlayingTexturesGenerater()

var cameraTexture: Texture?
Expand All @@ -30,8 +30,14 @@ class LiveMetalCameraViewController: UIViewController {
var videoCapture: VideoCapture!

// MARK - Core ML model
// DeepLabV3(iOS12+), DeepLabV3FP16(iOS12+), DeepLabV3Int8LUT(iOS12+)
let segmentationModel = DeepLabV3()
/// DeepLabV3(iOS12+), DeepLabV3FP16(iOS12+), DeepLabV3Int8LUT(iOS12+)
/// - labels: ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tv"]
/// - number of labels: 21
/// FaceParsing(iOS14+)
/// - labels: ["background", "skin", "l_brow", "r_brow", "l_eye", "r_eye", "eye_g", "l_ear", "r_ear", "ear_r", "nose", "mouth", "u_lip", "l_lip", "neck", "neck_l", "cloth", "hair", "hat"]
/// - number of labels: 19
lazy var segmentationModel = { return try! DeepLabV3() }()
let numberOfLabels = 21 // <#if you changed the segmentationModel, you have to change the numberOfLabels#>

// MARK: - Vision Properties
var request: VNCoreMLRequest?
Expand Down Expand Up @@ -138,16 +144,13 @@ extension LiveMetalCameraViewController {

if let observations = request.results as? [VNCoreMLFeatureValueObservation],
let segmentationmap = observations.first?.featureValue.multiArrayValue {

guard let row = segmentationmap.shape[0] as? Int,
let col = segmentationmap.shape[1] as? Int else {
return
}

let targetClass = 15 // index of human category

guard let cameraTexture = cameraTexture,
let segmentationTexture = segmentationTextureGenerater.texture(segmentationmap, row, col, targetClass) else {
let segmentationTexture = multitargetSegmentationTextureGenerater.texture(segmentationmap, row, col, numberOfLabels) else {
return
}

Expand Down
Expand Up @@ -20,8 +20,7 @@ class CameraTextureGenerater: NSObject {
CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, sharedMetalRenderingDevice.device, nil, &videoTextureCache)
}

func texture(from sampleBuffer: CMSampleBuffer) -> Texture? {
guard let cameraFrame = CMSampleBufferGetImageBuffer(sampleBuffer) else { return nil }
func texture(from cameraFrame: CVPixelBuffer) -> Texture? {
guard let videoTextureCache = videoTextureCache else { return nil }

let bufferWidth = CVPixelBufferGetWidth(cameraFrame)
Expand All @@ -44,4 +43,9 @@ class CameraTextureGenerater: NSObject {
return nil
}
}

func texture(from sampleBuffer: CMSampleBuffer) -> Texture? {
guard let cameraFrame = CMSampleBufferGetImageBuffer(sampleBuffer) else { return nil }
return texture(from: cameraFrame)
}
}