added support for LeReS

thygate · Dec 3, 2022 · ee06f97 · ee06f97
1 parent b786e21
commit ee06f97
Show file tree

Hide file tree

Showing 4 changed files with 175 additions and 68 deletions.
diff --git a/README.md b/README.md
@@ -1,12 +1,16 @@
 # Depth Maps for Stable Diffusion WebUI
 This script is an addon for [AUTOMATIC1111's Stable Diffusion Web UI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) that creates `depthmaps` from the generated images. The result can be viewed on 3D or holographic devices like VR headsets or [loogingglass](https://lookingglassfactory.com/) display, used in Render- or Game- Engines on a plane with a displacement modifier, and maybe even 3D printed.
 
-To generate realistic depth maps from a single image, this script uses code and models from the [MiDaS](https://github.com/isl-org/MiDaS) repository by Intel ISL. See [https://pytorch.org/hub/intelisl_midas_v2/](https://pytorch.org/hub/intelisl_midas_v2/) for more info.
+To generate realistic depth maps from a single image, this script uses code and models from the [MiDaS](https://github.com/isl-org/MiDaS) repository by Intel ISL (see [https://pytorch.org/hub/intelisl_midas_v2/](https://pytorch.org/hub/intelisl_midas_v2/) for more info), or LeReS from the [AdelaiDepth](https://github.com/aim-uofa/AdelaiDepth) repository by Advanced Intelligent Machines. 
 
 ## Examples
 [![screenshot](examples.png)](https://raw.githubusercontent.com/thygate/stable-diffusion-webui-depthmap-script/main/examples.png)
 
-## Updates
+## Changelog
+* v0.2.2 new features
+    * added (experimental) support for AdelaiDepth/LeReS (GPU Only!)
+    * new option to view depthmap as heatmap
+    * optimised ui layout
 * v0.2.1 bugfix
     * Correct seed is now used in filename and pnginfo when running batches. (see [issue](https://github.com/thygate/stable-diffusion-webui-depthmap-script/issues/35))
 * v0.2.0 upgrade
@@ -37,22 +41,26 @@ To generate realistic depth maps from a single image, this script uses code and
     * when not combining, depthmap is now saved as single channel 16 bit
 
 ## Install instructions
+The script is now also available to install from the `Available` subtab under the `Extensions` tab in the WebUI.
 ### Automatic installation 
 * In the WebUI, in the `Extensions` tab, in the `Install from URL` subtab, enter this repository 
 `https://github.com/thygate/stable-diffusion-webui-depthmap-script`
  and click install.
 
 >The midas repository will be cloned to /repositories/midas
 
->Model `weights` will be downloaded automatically on first use and saved to /models/midas.
+>The [BoostingMonocularDepth](https://github.com/compphoto/BoostingMonocularDepth) repository will be cloned to /repositories/BoostingMonocularDepth and added to sys.path
+
+>Model `weights` will be downloaded automatically on first use and saved to /models/midas or /models/leres
 
 ## Usage
 Select the "DepthMap vX.X.X" script from the script selection box in either txt2img or img2img.
 ![screenshot](options.png)
 
 The model can `Compute on` GPU and CPU, use CPU if low on VRAM. 
 
-There are four models available from the `Model` dropdown : dpt_large, dpt_hybrid, midas_v21_small, and midas_v21. See the [MiDaS](https://github.com/isl-org/MiDaS) repository for more info. The dpt_hybrid model yields good results in our experience, and is much smaller than the dpt_large model, which means shorter loading times when the model is reloaded on every run.
+There are five models available from the `Model` dropdown, the first four : dpt_large, dpt_hybrid, midas_v21_small, and midas_v21. See the [MiDaS](https://github.com/isl-org/MiDaS) repository for more info. The dpt_hybrid model yields good results in my experience, and is much smaller than the dpt_large model, which means shorter loading times when the model is reloaded on every run.
+For the fifth model, res101, see [AdelaiDepth/LeReS](https://github.com/aim-uofa/AdelaiDepth/tree/main/LeReS) for more info. It can only compute on GPU at this time.
 
 Net size can be set with `net width` and `net height`, or will be the same as the input image when `Match input size` is enabled. There is a trade-off between structural consistency and high-frequency details with respect to net size (see [observations](https://github.com/compphoto/BoostingMonocularDepth#observations)). Large maps will also need lots of VRAM.
 
@@ -62,6 +70,8 @@ Regardless of global settings, `Save DepthMap` will always save the depthmap in
 
 To see the generated output in the webui `Show DepthMap` should be enabled. When using Batch img2img this option should also be enabled.
 
+To make the depthmap easier to analyze for human eyes, `Show HeatMap` show an extra image in the WebUI that has a color gradient applied. It is not saved.
+
 When `Combine into one image` is enabled, the depthmap will be combined with the original image, the orientation can be selected with `Combine axis`. When disabled, the depthmap will be saved as a 16 bit single channel PNG as opposed to a three channel (RGB), 8 bit per channel image when the option is enabled.
 > 💡 Saving as any format other than PNG always produces an 8 bit, 3 channel RGB image. A single channel 16 bit image is only supported when saving as PNG.
 
@@ -92,7 +102,10 @@ Feel free to comment and share in the discussions.
 
 ## Acknowledgements
 
-This project uses code and information from following papers, from the repository [github.com/isl-org/MiDaS](https://github.com/isl-org/MiDaS) :
+This project uses code and information from following papers :
+
+MiDaS :
+
 ```
 @ARTICLE {Ranftl2022,
     author  = "Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun",
@@ -114,3 +127,31 @@ Dense Prediction Transformers, DPT-based model :
 	year      = {2021},
 }
 ```
+
+AdelaiDepth/LeReS :
+
+```
+@article{yin2022towards,
+  title={Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image},
+  author={Yin, Wei and Zhang, Jianming and Wang, Oliver and Niklaus, Simon and Chen, Simon and Liu, Yifan and Shen, Chunhua},
+  journal={TPAMI},
+  year={2022}
+}
+@inproceedings{Wei2021CVPR,
+  title     =  {Learning to Recover 3D Scene Shape from a Single Image},
+  author    =  {Wei Yin and Jianming Zhang and Oliver Wang and Simon Niklaus and Long Mai and Simon Chen and Chunhua Shen},
+  booktitle =  {Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR)},
+  year      =  {2021}
+}
+```
+
+Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging :
+
+```
+@INPROCEEDINGS{Miangoleh2021Boosting,
+author={S. Mahdi H. Miangoleh and Sebastian Dille and Long Mai and Sylvain Paris and Ya\u{g}{\i}z Aksoy},
+title={Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging},
+journal={Proc. CVPR},
+year={2021},
+}
+```
diff --git a/install.py b/install.py
@@ -1,2 +1,5 @@
 import launch
-launch.git_clone("https://github.com/isl-org/MiDaS.git", "repositories/midas", "midas")
+launch.git_clone("https://github.com/isl-org/MiDaS.git", "repositories/midas", "midas")
+launch.git_clone("https://github.com/compphoto/BoostingMonocularDepth.git", "repositories/BoostingMonocularDepth", "BoostingMonocularDepth")
+if not launch.is_installed("matplotlib"):
+    launch.run_pip("install matplotlib", "requirements for depthmap script")
diff --git a/options.png b/options.png
diff --git a/scripts/depthmap.py b/scripts/depthmap.py
@@ -8,23 +8,33 @@
 from modules.processing import create_infotext, process_images, Processed
 from modules.shared import opts, cmd_opts, state, Options
 from PIL import Image
+from pathlib import Path
 
+import sys
 import torch, gc
+import torch.nn as nn
 import cv2
 import requests
 import os.path
 import contextlib
+import matplotlib.pyplot as plt
+import numpy as np
+
+path_monorepo = Path.joinpath(Path().resolve(), "repositories\BoostingMonocularDepth")
+sys.path.append(str(path_monorepo))
+
+# AdelaiDepth imports
+from lib.multi_depth_model_woauxi import RelDepthModel
+from lib.net_tools import strip_prefix_if_present
 
-from torchvision.transforms import Compose
+from torchvision.transforms import Compose, transforms
+# midas imports
 from repositories.midas.midas.dpt_depth import DPTDepthModel
 from repositories.midas.midas.midas_net import MidasNet
 from repositories.midas.midas.midas_net_custom import MidasNet_small
 from repositories.midas.midas.transforms import Resize, NormalizeImage, PrepareForNet
 
-import numpy as np
-#import matplotlib.pyplot as plt
-
-scriptname = "DepthMap v0.2.1"
+scriptname = "DepthMap v0.2.2"
 
 class Script(scripts.Script):
 	def title(self):
@@ -34,31 +44,53 @@ def show(self, is_img2img):
 		return True
 
 	def ui(self, is_img2img):
-
-		compute_device = gr.Radio(label="Compute on", choices=['GPU','CPU'], value='GPU', type="index")
-		model_type = gr.Dropdown(label="Model", choices=['dpt_large','dpt_hybrid','midas_v21','midas_v21_small'], value='dpt_large', type="index", elem_id="model_type")
-		net_width = gr.Slider(minimum=64, maximum=2048, step=64, label='Net width', value=384)
-		net_height = gr.Slider(minimum=64, maximum=2048, step=64, label='Net height', value=384)
+
+		with gr.Row():
+			compute_device = gr.Radio(label="Compute on", choices=['GPU','CPU'], value='GPU', type="index")
+			model_type = gr.Dropdown(label="Model", choices=['dpt_large','dpt_hybrid','midas_v21','midas_v21_small','res101'], value='dpt_large', type="index", elem_id="model_type")
+		with gr.Row():
+			net_width = gr.Slider(minimum=64, maximum=2048, step=64, label='Net width', value=384)
+			net_height = gr.Slider(minimum=64, maximum=2048, step=64, label='Net height', value=384)
 		match_size = gr.Checkbox(label="Match input size",value=False)
 		invert_depth = gr.Checkbox(label="Invert DepthMap (black=near, white=far)",value=False)
-		save_depth = gr.Checkbox(label="Save DepthMap",value=True)
-		show_depth = gr.Checkbox(label="Show DepthMap",value=True)
-		combine_output = gr.Checkbox(label="Combine into one image.",value=True)
-		combine_output_axis = gr.Radio(label="Combine axis", choices=['Vertical','Horizontal'], value='Horizontal', type="index")
+		with gr.Row():
+			combine_output = gr.Checkbox(label="Combine into one image.",value=True)
+			combine_output_axis = gr.Radio(label="Combine axis", choices=['Vertical','Horizontal'], value='Horizontal', type="index")
+		with gr.Row():
+			save_depth = gr.Checkbox(label="Save DepthMap",value=True)
+			show_depth = gr.Checkbox(label="Show DepthMap",value=True)
+			show_heat = gr.Checkbox(label="Show HeatMap",value=False)
 
-		return [compute_device, model_type, net_width, net_height, match_size, invert_depth, save_depth, show_depth, combine_output, combine_output_axis]
+		return [compute_device, model_type, net_width, net_height, match_size, invert_depth, save_depth, show_depth, show_heat, combine_output, combine_output_axis]
 
-	def run(self, p, compute_device, model_type, net_width, net_height, match_size, invert_depth, save_depth, show_depth, combine_output, combine_output_axis):
+	def run(self, p, compute_device, model_type, net_width, net_height, match_size, invert_depth, save_depth, show_depth, show_heat, combine_output, combine_output_axis):
 
 		def download_file(filename, url):
-			print("Downloading midas model weights to %s" % filename)
+			print("Downloading model weights to %s" % filename)
 			with open(filename, 'wb') as fout:
 				response = requests.get(url, stream=True)
 				response.raise_for_status()
 				# Write response data to file
 				for block in response.iter_content(4096):
 					fout.write(block)
-
+		def scale_torch(img):
+			"""
+			Scale the image and output it in torch.tensor.
+			:param img: input rgb is in shape [H, W, C], input depth/disp is in shape [H, W]
+			:param scale: the scale factor. float
+			:return: img. [C, H, W]
+			"""
+			if len(img.shape) == 2:
+				img = img[np.newaxis, :, :]
+			if img.shape[2] == 3:
+				transform = transforms.Compose([transforms.ToTensor(),
+												transforms.Normalize((0.485, 0.456, 0.406) , (0.229, 0.224, 0.225) )])
+				img = transform(img.astype(np.float32))
+			else:
+				img = img.astype(np.float32)
+				img = torch.from_numpy(img)
+			return img
+
 		# sd process 
 		processed = processing.process_images(p)
 
@@ -69,18 +101,20 @@ def download_file(filename, url):
 		print('\n%s' % scriptname)
 
 		# init torch device
-		if compute_device == 0:
+		if compute_device == 0 or model_type == 4:
 			device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 		else:
 			device = torch.device("cpu")
 		print("device: %s" % device)
 
 		# model path and name
 		model_dir = "./models/midas"
+		if model_type == 4:
+			model_dir = "./models/leres"
 		# create path to model if not present
 		os.makedirs(model_dir, exist_ok=True)
 
-		print("Loading midas model weights from ", end=" ")
+		print("Loading model weights from ", end=" ")
 
 		try:
 			#"dpt_large"
@@ -139,33 +173,45 @@ def download_file(filename, url):
 					mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
 				)
 
+			#"res101"
+			elif model_type == 4: 
+				model_path = f"{model_dir}/res101.pth"
+				print(model_path)
+				if not os.path.exists(model_path):
+					download_file(model_path,"https://cloudstor.aarnet.edu.au/plus/s/lTIJF4vrvHCAI31/download")
+				checkpoint = torch.load(model_path)
+				model = RelDepthModel(backbone='resnext101')
+				model.load_state_dict(strip_prefix_if_present(checkpoint['depth_model'], "module."), strict=True)
+				del checkpoint
+
 			# override net size
 			if (match_size):
 				net_width, net_height = processed.width, processed.height
 
-			# init transform
-			transform = Compose(
-				[
-					Resize(
-						net_width,
-						net_height,
-						resize_target=None,
-						keep_aspect_ratio=True,
-						ensure_multiple_of=32,
-						resize_method=resize_mode,
-						image_interpolation_method=cv2.INTER_CUBIC,
-					),
-					normalization,
-					PrepareForNet(),
-				]
-			)
+			# init midas transform
+			if model_type != 4:
+				transform = Compose(
+					[
+						Resize(
+							net_width,
+							net_height,
+							resize_target=None,
+							keep_aspect_ratio=True,
+							ensure_multiple_of=32,
+							resize_method=resize_mode,
+							image_interpolation_method=cv2.INTER_CUBIC,
+						),
+						normalization,
+						PrepareForNet(),
+					]
+				)
 
 			model.eval()
 
 			# optimize
 			if device == torch.device("cuda"):
 				model = model.to(memory_format=torch.channels_last)  
-				if not cmd_opts.no_half:
+				if not cmd_opts.no_half and model_type != 4:
 					model = model.half()
 
 			model.to(device)
@@ -179,28 +225,44 @@ def download_file(filename, url):
 
 				# input image
 				img = cv2.cvtColor(np.asarray(processed.images[count]), cv2.COLOR_BGR2RGB) / 255.0
-				img_input = transform({"image": img})["image"]
-
-				# compute
-				precision_scope = torch.autocast if shared.cmd_opts.precision == "autocast" and device == torch.device("cuda") else contextlib.nullcontext
-				with torch.no_grad(), precision_scope("cuda"):
-					sample = torch.from_numpy(img_input).to(device).unsqueeze(0)
-					if device == torch.device("cuda"):
-						sample = sample.to(memory_format=torch.channels_last) 
-						if not cmd_opts.no_half:
-							sample = sample.half()
-					prediction = model.forward(sample)
-					prediction = (
-						torch.nn.functional.interpolate(
-							prediction.unsqueeze(1),
-							size=img.shape[:2],
-							mode="bicubic",
-							align_corners=False,
+
+				if model_type == 4:
+
+					# leres transform input
+					rgb_c = img[:, :, ::-1].copy()
+					A_resize = cv2.resize(rgb_c, (net_width, net_height))
+					img_torch = scale_torch(A_resize)[None, :, :, :] 
+					# Forward pass
+					with torch.no_grad():
+						prediction = model.inference(img_torch)
+					prediction = prediction.squeeze().cpu().numpy()
+					prediction = cv2.resize(prediction, (img.shape[1], img.shape[0]), interpolation=cv2.INTER_CUBIC)
+
+				else:
+
+					# midas transform input
+					img_input = transform({"image": img})["image"]
+
+					# compute
+					precision_scope = torch.autocast if shared.cmd_opts.precision == "autocast" and device == torch.device("cuda") else contextlib.nullcontext
+					with torch.no_grad(), precision_scope("cuda"):
+						sample = torch.from_numpy(img_input).to(device).unsqueeze(0)
+						if device == torch.device("cuda"):
+							sample = sample.to(memory_format=torch.channels_last) 
+							if not cmd_opts.no_half:
+								sample = sample.half()
+						prediction = model.forward(sample)
+						prediction = (
+							torch.nn.functional.interpolate(
+								prediction.unsqueeze(1),
+								size=img.shape[:2],
+								mode="bicubic",
+								align_corners=False,
+							)
+							.squeeze()
+							.cpu()
+							.numpy()
 						)
-						.squeeze()
-						.cpu()
-						.numpy()
-					)
 
 				# output
 				depth = prediction
@@ -219,7 +281,7 @@ def download_file(filename, url):
 				img_output = out.astype("uint16")
 
 				# invert depth map
-				if invert_depth:
+				if invert_depth ^ model_type == 4:
 					img_output = cv2.bitwise_not(img_output)
 
 				# three channel, 8 bits per channel image
@@ -250,9 +312,10 @@ def download_file(filename, url):
 					if save_depth:
 						images.save_image(Image.fromarray(img_concat), p.outpath_samples, "", processed.all_seeds[count-1], processed.all_prompts[count-1], opts.samples_format, info=info, p=p, suffix="_depth")
 
-				#colormap = plt.get_cmap('inferno')
-				#heatmap = (colormap(img_output2[:,:,0] / 256.0) * 2**16).astype(np.uint16)[:,:,:3]
-				#processed.images.append(heatmap)
+				if show_heat:
+					colormap = plt.get_cmap('inferno')
+					heatmap = (colormap(img_output2[:,:,0] / 256.0) * 2**16).astype(np.uint16)[:,:,:3]
+					processed.images.append(heatmap)
 
 		except RuntimeError as e:
 			if 'out of memory' in str(e):