# Inference with PackNet-SfM
A notebook for doing inference on your own images using the Toyota's **[PackNet-SfM](https://arxiv.org/abs/1905.02693)** official PyTorch implementation, a 3D packing for self-supervised monocular Depth Estimation.  


```
@inproceedings{packnet,
  author = {Vitor Guizilini and Rares Ambrus and Sudeep Pillai and Allan Raventos and Adrien Gaidon},
  title = {3D Packing for Self-Supervised Monocular Depth Estimation},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  primaryClass = {cs.CV}
  year = {2020},
}
```  
The official GitHub repository for this work includes instructions on how to set a reproducible environment for training/evaluation/inference using Docker. The purpose of this notebook is to provide an alternative setup to Docker within Colab.  
A GPU runtime is required to run the code in this notebook.



## Settings

Clone the official GitHub repository.

In [None]:
!git clone https://github.com/TRI-ML/packnet-sfm.git
%cd packnet-sfm

Install the missing dependencies. All the mandatory requirements for PackNet-SfM are already available in the Colab virtual envinronment, but *yacs*. We need also to install *python-wget* for checkpoints donwload. 

In [None]:
!pip install yacs
!pip install wget

Add the *packet_sfm* package to the Python sys.path

In [None]:
%set_env PYTHONPATH=/content/packnet-sfm/packnet_sfm:/env/python

Create the model's checkpoint directory.

In [None]:
!mkdir ./checkpoints
checkpoints_dir = './checkpoints'

Create the input and output directories.

In [None]:
!mkdir ./input
!mkdir ./output
image_input_dir = './input'
image_output_dir = './output'

## Source Code

Define a function to upload your own images in the input directory.

In [None]:
import os
import shutil
from google.colab import files

def upload_files(upload_path):
  uploaded = files.upload()
  for filename, content in uploaded.items():
    dst_path = os.path.join(upload_path, filename)
    shutil.move(filename, dst_path)
  return list(uploaded.keys())

Define a function to check if a given file is an image having a certain extension and another one to do inference and optionally save the depth map only to file. These 2 functions have been copied from the */content/packnet-sfm/scripts/infer.py* file.

In [None]:
import numpy as np
import torch

from glob import glob
from cv2 import imwrite

from packnet_sfm.models.model_wrapper import ModelWrapper
from packnet_sfm.datasets.augmentations import resize_image, to_tensor
from packnet_sfm.utils.horovod import hvd_init, rank, world_size, print0
from packnet_sfm.utils.image import load_image
from packnet_sfm.utils.config import parse_test_file
from packnet_sfm.utils.load import set_debug
from packnet_sfm.utils.depth import write_depth, inv2depth, viz_inv_depth
from packnet_sfm.utils.logging import pcolor


def is_image(file, ext=('.png', '.jpg',)):
    """Check if a file is an image with certain extensions"""
    return file.endswith(ext)

@torch.no_grad()
def infer_and_save_depth(input_file, output_file, model_wrapper, image_shape, half, save):
    """
    Process a single input file to produce and save visualization

    Parameters
    ----------
    input_file : str
        Image file
    output_file : str
        Output file, or folder where the output will be saved
    model_wrapper : nn.Module
        Model wrapper used for inference
    image_shape : Image shape
        Input image shape
    half: bool
        use half precision (fp16)
    save: str
        Save format (npz or png)
    """
    if not is_image(output_file):
        # If not an image, assume it's a folder and append the input name
        os.makedirs(output_file, exist_ok=True)
        output_file = os.path.join(output_file, os.path.basename(input_file))

    # change to half precision for evaluation if requested
    dtype = torch.float16 if half else None

    # Load image
    image = load_image(input_file)
    # Resize and to tensor
    image = resize_image(image, image_shape)
    image = to_tensor(image).unsqueeze(0)

    # Send image to GPU if available
    if torch.cuda.is_available():
        image = image.to('cuda:{}'.format(rank()), dtype=dtype)

    # Depth inference (returns predicted inverse depth)
    pred_inv_depth = model_wrapper.depth(image)['inv_depths'][0]

    if save == 'npz' or save == 'png':
        # Get depth from predicted depth map and save to different formats
        filename = '{}.{}'.format(os.path.splitext(output_file)[0], save)
        print('Saving {} to {}'.format(
            pcolor(input_file, 'cyan', attrs=['bold']),
            pcolor(filename, 'magenta', attrs=['bold'])))
        write_depth(filename, depth=inv2depth(pred_inv_depth))
    else:
        # Prepare RGB image
        rgb = image[0].permute(1, 2, 0).detach().cpu().numpy() * 255
        # Prepare inverse depth
        viz_pred_inv_depth = viz_inv_depth(pred_inv_depth[0]) * 255
        # Concatenate both vertically
        image = np.concatenate([rgb, viz_pred_inv_depth], 0)
        # Save visualization
        print('Saving {} to {}'.format(
            pcolor(input_file, 'cyan', attrs=['bold']),
            pcolor(output_file, 'magenta', attrs=['bold'])))
        imwrite(output_file, image[:, :, ::-1])

Define a function for model initialization and inference settings. This is a modified version of the same available in the original script.

In [None]:
def do_inference(checkpoint, input, output, image_shape=None, half=None, save=None):

    # Initialize horovod
    hvd_init()

    # Parse the checkpoint file
    config, state_dict = parse_test_file(checkpoint)

    # If no image shape is provided, use the checkpoint one
    if image_shape is None:
        image_shape = config.datasets.augmentation.image_shape

    # Set debug if requested
    set_debug(config.debug)

    # Initialize model wrapper from checkpoint arguments
    model_wrapper = ModelWrapper(config, load_datasets=False)
    # Restore monodepth_model state
    model_wrapper.load_state_dict(state_dict)

    # change to half precision for evaluation if requested
    if half == "No":
        half = False
    dtype = torch.float16 if half else None

    # Send model to GPU if available
    if torch.cuda.is_available():
        model_wrapper = model_wrapper.to('cuda:{}'.format(rank()), dtype=dtype)

    # Set to eval mode
    model_wrapper.eval()

    if os.path.isdir(input):
        # If input file is a folder, search for image files
        files = []
        for ext in ['png', 'jpg']:
            files.extend(glob((os.path.join(input, '*.{}'.format(ext)))))
        files.sort()
        print0('Found {} files'.format(len(files)))
    else:
        # Otherwise, use it as is
        files = [input]

    # Process each file
    for fn in files[rank()::world_size()]:
        infer_and_save_depth(
            fn, output, model_wrapper, image_shape, half, save)

Create a checkpoint dictionary, as multiple pre-trained models are available for download. This dictionary will be used by the settings form.

In [None]:
checkpoint_dict = {
    "ResNet18 Self-Supervised 384x640 ImageNet DDAD": "https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/models/ResNet18_MR_selfsup_D.ckpt", 
    "PackNet Self-Supervised 384x640 DDAD": "https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/models/PackNet01_MR_selfsup_D.ckpt", 
    "PackNetSAN Supervised 384x640 DDAD": "https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/models/PackNetSAN01_HR_sup_D.ckpt",
    "ResNet18 Self-Supervised 192x640 ImageNet KITTI": "https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/models/ResNet18_MR_selfsup_K.ckpt",
    "PackNet Self-Supervised 192x640 KITTI": "https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/models/PackNet01_MR_selfsup_K.ckpt",
    "PackNet Self-Supervised Scale-Aware 192x640 CS K": "https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/models/PackNet01_MR_velsup_CStoK.ckpt",
    "PackNet Self-Supervised Scale-Aware 384x1280 CS K": "https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/models/PackNet01_HR_velsup_CStoK.ckpt",
    "PackNet Semi-Supervised densified GT 192x640 CS K": "https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/models/PackNet01_MR_semisup_CStoK.ckpt",
    "PackNetSAN Supervised densified GT 352x1216 K": "https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/models/PackNetSAN01_HR_sup_K.ckpt"
}

## Inference

Upload your own image(s). The will be saved in the */content/packnet-sfm/input/* directory.

In [None]:
uploaded_image_list = upload_files(image_input_dir)

Build the inference settings form. Here it is possible to select model checkpoint, halv the precision and save a predicted depth map in a separate image file or NumPy compressed array format.

In [None]:
#@title Inference Options

checkpoint = 'ResNet18 Self-Supervised 384x640 ImageNet DDAD' #@param ["ResNet18 Self-Supervised 384x640 ImageNet DDAD", "PackNet Self-Supervised 384x640 DDAD", "PackNetSAN Supervised 384x640 DDAD", "ResNet18 Self-Supervised 192x640 ImageNet KITTI", "PackNet Self-Supervised 192x640 KITTI", "PackNet Self-Supervised Scale-Aware 192x640 CS K", "PackNet Self-Supervised Scale-Aware 384x1280 CS K", "PackNet Semi-Supervised densified GT 192x640 CS K", "PackNetSAN Supervised densified GT 352x1216 K"]
half = "No" #@param ["No", "Yes"]
save = "None" #@param ["None", "png", "npz"]

Do inference on the uploaded images. Results are saved in the */content/packnet-sfm/output/* directory.

In [None]:
import wget

selected_checkpoint_url = checkpoint_dict[checkpoint]
last_file_sep_index = selected_checkpoint_url.rindex('/')
checkpoint_filename = selected_checkpoint_url[last_file_sep_index + 1: ]
checkpoint_full_path = checkpoints_dir + '/' + checkpoint_filename
if not os.path.exists(checkpoint_full_path):
  wget.download(selected_checkpoint_url, out=checkpoints_dir)
do_inference(checkpoint_full_path, image_input_dir, image_output_dir, half=half, save=save)

Display the results.

In [None]:
import matplotlib.pyplot as plt
import cv2

input_images_path = image_output_dir
items = os.listdir(input_images_path)    

for each_image in items:
  if each_image.endswith(".jpg") or each_image.endswith(".png"):
    print(each_image)
    full_path = input_images_path + '/' + each_image
    image = cv2.imread(full_path)
    image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
    plt.figure(figsize=(20, 10))
    plt.imshow(image)
    plt.grid(False)
    plt.axis('off')