## Task

"Now let’s do a little bibliographic research, and investigate if there are any ML/AI approaches publicly available with a ready-to-use implementation on GitHub. Short-list top 3 approaches that you succeed to find with values of key metrics on the most common datasets. Select among them the best approach from your point of view, figure out how to install and run it locally on GPU, make a step-by-step guide so that a person evaluating this assessment may follow this guide and run this approach on their end."

### Research

I check the PapersWithCode [Depth Estimation task page](https://paperswithcode.com/task/depth-estimation) for benchmarks. 	
Stanford2D3D Panoramic and NYU-Depth V2 seem to be the most active benchmarks with most model. The Stanford benchmark seems to have many specific "panorama" entries and because the dataset that seems to be most relevant for this task is NYU-Depth V2, I'll go with [the top models of the Depth Estimation on NYU-Depth V2 benchmark](https://paperswithcode.com/sota/depth-estimation-on-nyu-depth-v2).

These are the top 5 entries:

![image.png](attachment:979d3943-0465-49e6-8dfe-0ae419414530.png)

Of the top 3 entries, all of them come with a GitHub repo or even a Google Colab notebook where the approach can easily be run. As this task is about the top 3, I gather some additional metrics on them to facilitate a decision.

**Update**: While working on this I stumbled upon this post about a new [Depth Anything model on 🤗](https://twitter.com/NielsRogge/status/1750546893480853801). So I checked the paper and found a new SOTA of **0.206** on NUY-Depth V2. I put it in the list at rank 0 and offer it as first option to follow along. Several models come in different sizes; usually the smaller versions are faster which comes at the cost of lower accuracy.

|Rank |Model|RMS |Versions  |Size [MB of saved weights] |License |
|-----|:-----|:---:|:-----:|:-----:|:-----:|
|0 |Depth Anything  |0.206   | small, base, large | 100 - 1,000+    | Apache-2.0 |
|1 |EVP  |0.224   | large | 1,000+    | MIT |
|2|DINOv2|0.279   | small, base, large, G (non-distilled) |84 - 1,000+ |Apache-2.0 |
|3|SwinV2-L-1K-MIM|0.287   | large |884 | MIT|

**Update 2**: When I wanted to update the PapersWithCode leaderboard for the Depth Estimation task I found that there's a *Monocular Depth Estimation* leaderboard that alread contained Depth Anything. EVP is in the top 3 there as well and it comes with some more key metrics:

![image.png](attachment:eae8650d-6d4f-4ec4-ad00-e956fc0f8299.png)

### Option 1: Depth Anything

To get a quick overview, go to the [impressive Depth Anything GitHub Pages site](https://depth-anything.github.io/). After that you can test your own images on the [Depth Anything HuggingFace Space](https://huggingface.co/spaces/LiheYoung/Depth-Anything) or explore it programmatically in [Google Colab](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/Depth%20Anything/Predicting_depth_in_an_image_with_Depth_Anything.ipynb).

![image.png](attachment:5ef543d8-cc8b-4f2e-9f7f-16a1e5181a18.png)

If you want to run it on your own machine, follow the section *Run on your own machine* in the *Option 2* section below but download the above mentioned Google Colab notebook instead of the EVP notebook.

### Option 2: Running EVP yourself (Nvidia GPU min 16GB required or Colab)

From what I know about the use cases of CENDAS so far I don't see any compute restrictions (due to edge deployment for example) so I'll go ahead with the EVP model which has the lowest RMSE (= top score). Compute doesn't matter so I use the large models instead of any small or medium sizes.

To get a quick idea of the performance on the model, it can be tested with some examples or custom images on the corresponding [HuggingFace 🤗 Space](https://huggingface.co/spaces/MykolaL/evp).

There's also the option to follow it on [Google Colab](https://colab.research.google.com/drive/1rd0_2AMyHlEaeYlWldZ-xGaGRYhP_TVb?usp=sharing).

#### Run on your own machine

If you want to run the model locally, make sure you have at least 16 GB of GPU memory available - more is better. It is assumed that you have an Nvidia GPU, run Linux (on amd64) and have a recent version of Python 3 (e.g. 3.11) and the necessary Nvidia drivers installed (tested with CUDA version 12.1).

Create a local Python virtual environment:

```bash
mkdir depth-estimation && cd depth-estimation
python3 -m venv .venv
source .venv/bin/activate
pip install notebook ipywidgets jupyter_contrib_nbextensions
```

Now open the [EVP Google Colab notebook](https://colab.research.google.com/drive/1rd0_2AMyHlEaeYlWldZ-xGaGRYhP_TVb?usp=sharing) and download it under *File > Download > Download .ipynb*. Place the downloaded file in the *depth-estimation* directory you've created above. Start Jupyter notebook like this:

```bash
jupyter notebook
```
    
Then open the *EVP.ipynb* notebook and run through the cells. The section *REFERRING SEGMENTATION* at the end of the notebook is irrelevant for the depth estimation task, so feel free to skip it.