Skip to content

Commit

Permalink
added support for LeReS
Browse files Browse the repository at this point in the history
  • Loading branch information
thygate committed Dec 3, 2022
1 parent b786e21 commit ee06f97
Show file tree
Hide file tree
Showing 4 changed files with 175 additions and 68 deletions.
51 changes: 46 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
# Depth Maps for Stable Diffusion WebUI
This script is an addon for [AUTOMATIC1111's Stable Diffusion Web UI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) that creates `depthmaps` from the generated images. The result can be viewed on 3D or holographic devices like VR headsets or [loogingglass](https://lookingglassfactory.com/) display, used in Render- or Game- Engines on a plane with a displacement modifier, and maybe even 3D printed.

To generate realistic depth maps from a single image, this script uses code and models from the [MiDaS](https://github.com/isl-org/MiDaS) repository by Intel ISL. See [https://pytorch.org/hub/intelisl_midas_v2/](https://pytorch.org/hub/intelisl_midas_v2/) for more info.
To generate realistic depth maps from a single image, this script uses code and models from the [MiDaS](https://github.com/isl-org/MiDaS) repository by Intel ISL (see [https://pytorch.org/hub/intelisl_midas_v2/](https://pytorch.org/hub/intelisl_midas_v2/) for more info), or LeReS from the [AdelaiDepth](https://github.com/aim-uofa/AdelaiDepth) repository by Advanced Intelligent Machines.

## Examples
[![screenshot](examples.png)](https://raw.githubusercontent.com/thygate/stable-diffusion-webui-depthmap-script/main/examples.png)

## Updates
## Changelog
* v0.2.2 new features
* added (experimental) support for AdelaiDepth/LeReS (GPU Only!)
* new option to view depthmap as heatmap
* optimised ui layout
* v0.2.1 bugfix
* Correct seed is now used in filename and pnginfo when running batches. (see [issue](https://github.com/thygate/stable-diffusion-webui-depthmap-script/issues/35))
* v0.2.0 upgrade
Expand Down Expand Up @@ -37,22 +41,26 @@ To generate realistic depth maps from a single image, this script uses code and
* when not combining, depthmap is now saved as single channel 16 bit

## Install instructions
The script is now also available to install from the `Available` subtab under the `Extensions` tab in the WebUI.
### Automatic installation
* In the WebUI, in the `Extensions` tab, in the `Install from URL` subtab, enter this repository
`https://github.com/thygate/stable-diffusion-webui-depthmap-script`
and click install.

>The midas repository will be cloned to /repositories/midas
>Model `weights` will be downloaded automatically on first use and saved to /models/midas.
>The [BoostingMonocularDepth](https://github.com/compphoto/BoostingMonocularDepth) repository will be cloned to /repositories/BoostingMonocularDepth and added to sys.path
>Model `weights` will be downloaded automatically on first use and saved to /models/midas or /models/leres
## Usage
Select the "DepthMap vX.X.X" script from the script selection box in either txt2img or img2img.
![screenshot](options.png)

The model can `Compute on` GPU and CPU, use CPU if low on VRAM.

There are four models available from the `Model` dropdown : dpt_large, dpt_hybrid, midas_v21_small, and midas_v21. See the [MiDaS](https://github.com/isl-org/MiDaS) repository for more info. The dpt_hybrid model yields good results in our experience, and is much smaller than the dpt_large model, which means shorter loading times when the model is reloaded on every run.
There are five models available from the `Model` dropdown, the first four : dpt_large, dpt_hybrid, midas_v21_small, and midas_v21. See the [MiDaS](https://github.com/isl-org/MiDaS) repository for more info. The dpt_hybrid model yields good results in my experience, and is much smaller than the dpt_large model, which means shorter loading times when the model is reloaded on every run.
For the fifth model, res101, see [AdelaiDepth/LeReS](https://github.com/aim-uofa/AdelaiDepth/tree/main/LeReS) for more info. It can only compute on GPU at this time.

Net size can be set with `net width` and `net height`, or will be the same as the input image when `Match input size` is enabled. There is a trade-off between structural consistency and high-frequency details with respect to net size (see [observations](https://github.com/compphoto/BoostingMonocularDepth#observations)). Large maps will also need lots of VRAM.

Expand All @@ -62,6 +70,8 @@ Regardless of global settings, `Save DepthMap` will always save the depthmap in

To see the generated output in the webui `Show DepthMap` should be enabled. When using Batch img2img this option should also be enabled.

To make the depthmap easier to analyze for human eyes, `Show HeatMap` show an extra image in the WebUI that has a color gradient applied. It is not saved.

When `Combine into one image` is enabled, the depthmap will be combined with the original image, the orientation can be selected with `Combine axis`. When disabled, the depthmap will be saved as a 16 bit single channel PNG as opposed to a three channel (RGB), 8 bit per channel image when the option is enabled.
> 💡 Saving as any format other than PNG always produces an 8 bit, 3 channel RGB image. A single channel 16 bit image is only supported when saving as PNG.
Expand Down Expand Up @@ -92,7 +102,10 @@ Feel free to comment and share in the discussions.

## Acknowledgements

This project uses code and information from following papers, from the repository [github.com/isl-org/MiDaS](https://github.com/isl-org/MiDaS) :
This project uses code and information from following papers :

MiDaS :

```
@ARTICLE {Ranftl2022,
author = "Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun",
Expand All @@ -114,3 +127,31 @@ Dense Prediction Transformers, DPT-based model :
year = {2021},
}
```

AdelaiDepth/LeReS :

```
@article{yin2022towards,
title={Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image},
author={Yin, Wei and Zhang, Jianming and Wang, Oliver and Niklaus, Simon and Chen, Simon and Liu, Yifan and Shen, Chunhua},
journal={TPAMI},
year={2022}
}
@inproceedings{Wei2021CVPR,
title = {Learning to Recover 3D Scene Shape from a Single Image},
author = {Wei Yin and Jianming Zhang and Oliver Wang and Simon Niklaus and Long Mai and Simon Chen and Chunhua Shen},
booktitle = {Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR)},
year = {2021}
}
```

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging :

```
@INPROCEEDINGS{Miangoleh2021Boosting,
author={S. Mahdi H. Miangoleh and Sebastian Dille and Long Mai and Sylvain Paris and Ya\u{g}{\i}z Aksoy},
title={Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging},
journal={Proc. CVPR},
year={2021},
}
```
5 changes: 4 additions & 1 deletion install.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
import launch
launch.git_clone("https://github.com/isl-org/MiDaS.git", "repositories/midas", "midas")
launch.git_clone("https://github.com/isl-org/MiDaS.git", "repositories/midas", "midas")
launch.git_clone("https://github.com/compphoto/BoostingMonocularDepth.git", "repositories/BoostingMonocularDepth", "BoostingMonocularDepth")
if not launch.is_installed("matplotlib"):
launch.run_pip("install matplotlib", "requirements for depthmap script")
Binary file modified options.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
187 changes: 125 additions & 62 deletions scripts/depthmap.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,33 @@
from modules.processing import create_infotext, process_images, Processed
from modules.shared import opts, cmd_opts, state, Options
from PIL import Image
from pathlib import Path

import sys
import torch, gc
import torch.nn as nn
import cv2
import requests
import os.path
import contextlib
import matplotlib.pyplot as plt
import numpy as np

path_monorepo = Path.joinpath(Path().resolve(), "repositories\BoostingMonocularDepth")
sys.path.append(str(path_monorepo))

# AdelaiDepth imports
from lib.multi_depth_model_woauxi import RelDepthModel
from lib.net_tools import strip_prefix_if_present

from torchvision.transforms import Compose
from torchvision.transforms import Compose, transforms
# midas imports
from repositories.midas.midas.dpt_depth import DPTDepthModel
from repositories.midas.midas.midas_net import MidasNet
from repositories.midas.midas.midas_net_custom import MidasNet_small
from repositories.midas.midas.transforms import Resize, NormalizeImage, PrepareForNet

import numpy as np
#import matplotlib.pyplot as plt

scriptname = "DepthMap v0.2.1"
scriptname = "DepthMap v0.2.2"

class Script(scripts.Script):
def title(self):
Expand All @@ -34,31 +44,53 @@ def show(self, is_img2img):
return True

def ui(self, is_img2img):

compute_device = gr.Radio(label="Compute on", choices=['GPU','CPU'], value='GPU', type="index")
model_type = gr.Dropdown(label="Model", choices=['dpt_large','dpt_hybrid','midas_v21','midas_v21_small'], value='dpt_large', type="index", elem_id="model_type")
net_width = gr.Slider(minimum=64, maximum=2048, step=64, label='Net width', value=384)
net_height = gr.Slider(minimum=64, maximum=2048, step=64, label='Net height', value=384)

with gr.Row():
compute_device = gr.Radio(label="Compute on", choices=['GPU','CPU'], value='GPU', type="index")
model_type = gr.Dropdown(label="Model", choices=['dpt_large','dpt_hybrid','midas_v21','midas_v21_small','res101'], value='dpt_large', type="index", elem_id="model_type")
with gr.Row():
net_width = gr.Slider(minimum=64, maximum=2048, step=64, label='Net width', value=384)
net_height = gr.Slider(minimum=64, maximum=2048, step=64, label='Net height', value=384)
match_size = gr.Checkbox(label="Match input size",value=False)
invert_depth = gr.Checkbox(label="Invert DepthMap (black=near, white=far)",value=False)
save_depth = gr.Checkbox(label="Save DepthMap",value=True)
show_depth = gr.Checkbox(label="Show DepthMap",value=True)
combine_output = gr.Checkbox(label="Combine into one image.",value=True)
combine_output_axis = gr.Radio(label="Combine axis", choices=['Vertical','Horizontal'], value='Horizontal', type="index")
with gr.Row():
combine_output = gr.Checkbox(label="Combine into one image.",value=True)
combine_output_axis = gr.Radio(label="Combine axis", choices=['Vertical','Horizontal'], value='Horizontal', type="index")
with gr.Row():
save_depth = gr.Checkbox(label="Save DepthMap",value=True)
show_depth = gr.Checkbox(label="Show DepthMap",value=True)
show_heat = gr.Checkbox(label="Show HeatMap",value=False)

return [compute_device, model_type, net_width, net_height, match_size, invert_depth, save_depth, show_depth, combine_output, combine_output_axis]
return [compute_device, model_type, net_width, net_height, match_size, invert_depth, save_depth, show_depth, show_heat, combine_output, combine_output_axis]

def run(self, p, compute_device, model_type, net_width, net_height, match_size, invert_depth, save_depth, show_depth, combine_output, combine_output_axis):
def run(self, p, compute_device, model_type, net_width, net_height, match_size, invert_depth, save_depth, show_depth, show_heat, combine_output, combine_output_axis):

def download_file(filename, url):
print("Downloading midas model weights to %s" % filename)
print("Downloading model weights to %s" % filename)
with open(filename, 'wb') as fout:
response = requests.get(url, stream=True)
response.raise_for_status()
# Write response data to file
for block in response.iter_content(4096):
fout.write(block)

def scale_torch(img):
"""
Scale the image and output it in torch.tensor.
:param img: input rgb is in shape [H, W, C], input depth/disp is in shape [H, W]
:param scale: the scale factor. float
:return: img. [C, H, W]
"""
if len(img.shape) == 2:
img = img[np.newaxis, :, :]
if img.shape[2] == 3:
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.485, 0.456, 0.406) , (0.229, 0.224, 0.225) )])
img = transform(img.astype(np.float32))
else:
img = img.astype(np.float32)
img = torch.from_numpy(img)
return img

# sd process
processed = processing.process_images(p)

Expand All @@ -69,18 +101,20 @@ def download_file(filename, url):
print('\n%s' % scriptname)

# init torch device
if compute_device == 0:
if compute_device == 0 or model_type == 4:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
else:
device = torch.device("cpu")
print("device: %s" % device)

# model path and name
model_dir = "./models/midas"
if model_type == 4:
model_dir = "./models/leres"
# create path to model if not present
os.makedirs(model_dir, exist_ok=True)

print("Loading midas model weights from ", end=" ")
print("Loading model weights from ", end=" ")

try:
#"dpt_large"
Expand Down Expand Up @@ -139,33 +173,45 @@ def download_file(filename, url):
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
)

#"res101"
elif model_type == 4:
model_path = f"{model_dir}/res101.pth"
print(model_path)
if not os.path.exists(model_path):
download_file(model_path,"https://cloudstor.aarnet.edu.au/plus/s/lTIJF4vrvHCAI31/download")
checkpoint = torch.load(model_path)
model = RelDepthModel(backbone='resnext101')
model.load_state_dict(strip_prefix_if_present(checkpoint['depth_model'], "module."), strict=True)
del checkpoint

# override net size
if (match_size):
net_width, net_height = processed.width, processed.height

# init transform
transform = Compose(
[
Resize(
net_width,
net_height,
resize_target=None,
keep_aspect_ratio=True,
ensure_multiple_of=32,
resize_method=resize_mode,
image_interpolation_method=cv2.INTER_CUBIC,
),
normalization,
PrepareForNet(),
]
)
# init midas transform
if model_type != 4:
transform = Compose(
[
Resize(
net_width,
net_height,
resize_target=None,
keep_aspect_ratio=True,
ensure_multiple_of=32,
resize_method=resize_mode,
image_interpolation_method=cv2.INTER_CUBIC,
),
normalization,
PrepareForNet(),
]
)

model.eval()

# optimize
if device == torch.device("cuda"):
model = model.to(memory_format=torch.channels_last)
if not cmd_opts.no_half:
if not cmd_opts.no_half and model_type != 4:
model = model.half()

model.to(device)
Expand All @@ -179,28 +225,44 @@ def download_file(filename, url):

# input image
img = cv2.cvtColor(np.asarray(processed.images[count]), cv2.COLOR_BGR2RGB) / 255.0
img_input = transform({"image": img})["image"]

# compute
precision_scope = torch.autocast if shared.cmd_opts.precision == "autocast" and device == torch.device("cuda") else contextlib.nullcontext
with torch.no_grad(), precision_scope("cuda"):
sample = torch.from_numpy(img_input).to(device).unsqueeze(0)
if device == torch.device("cuda"):
sample = sample.to(memory_format=torch.channels_last)
if not cmd_opts.no_half:
sample = sample.half()
prediction = model.forward(sample)
prediction = (
torch.nn.functional.interpolate(
prediction.unsqueeze(1),
size=img.shape[:2],
mode="bicubic",
align_corners=False,

if model_type == 4:

# leres transform input
rgb_c = img[:, :, ::-1].copy()
A_resize = cv2.resize(rgb_c, (net_width, net_height))
img_torch = scale_torch(A_resize)[None, :, :, :]
# Forward pass
with torch.no_grad():
prediction = model.inference(img_torch)
prediction = prediction.squeeze().cpu().numpy()
prediction = cv2.resize(prediction, (img.shape[1], img.shape[0]), interpolation=cv2.INTER_CUBIC)

else:

# midas transform input
img_input = transform({"image": img})["image"]

# compute
precision_scope = torch.autocast if shared.cmd_opts.precision == "autocast" and device == torch.device("cuda") else contextlib.nullcontext
with torch.no_grad(), precision_scope("cuda"):
sample = torch.from_numpy(img_input).to(device).unsqueeze(0)
if device == torch.device("cuda"):
sample = sample.to(memory_format=torch.channels_last)
if not cmd_opts.no_half:
sample = sample.half()
prediction = model.forward(sample)
prediction = (
torch.nn.functional.interpolate(
prediction.unsqueeze(1),
size=img.shape[:2],
mode="bicubic",
align_corners=False,
)
.squeeze()
.cpu()
.numpy()
)
.squeeze()
.cpu()
.numpy()
)

# output
depth = prediction
Expand All @@ -219,7 +281,7 @@ def download_file(filename, url):
img_output = out.astype("uint16")

# invert depth map
if invert_depth:
if invert_depth ^ model_type == 4:
img_output = cv2.bitwise_not(img_output)

# three channel, 8 bits per channel image
Expand Down Expand Up @@ -250,9 +312,10 @@ def download_file(filename, url):
if save_depth:
images.save_image(Image.fromarray(img_concat), p.outpath_samples, "", processed.all_seeds[count-1], processed.all_prompts[count-1], opts.samples_format, info=info, p=p, suffix="_depth")

#colormap = plt.get_cmap('inferno')
#heatmap = (colormap(img_output2[:,:,0] / 256.0) * 2**16).astype(np.uint16)[:,:,:3]
#processed.images.append(heatmap)
if show_heat:
colormap = plt.get_cmap('inferno')
heatmap = (colormap(img_output2[:,:,0] / 256.0) * 2**16).astype(np.uint16)[:,:,:3]
processed.images.append(heatmap)

except RuntimeError as e:
if 'out of memory' in str(e):
Expand Down

0 comments on commit ee06f97

Please sign in to comment.