# Part 3: Publicly available Depth Estimation

## Task

"Now let’s do a little bibliographic research, and investigate if there are any ML/AI approaches publicly available with a ready-to-use implementation on GitHub. Short-list top 3 approaches that you succeed to find with values of key metrics on the most common datasets. Select among them the best approach from your point of view, figure out how to install and run it locally on GPU, make a step-by-step guide so that a person evaluating this assessment may follow this guide and run this approach on their end."

## Research

I check the PapersWithCode [Monocular Depth Estimation on NYU-Depth V2 SOTA page](https://paperswithcode.com/sota/monocular-depth-estimation-on-nyu-depth-v2) for the state-of-the-art models.

These are the top 5 entries:

![image.png](attachment:fa6bfaf5-59f3-4250-bea6-9723861b26e2.png)

Of the top 3 entries, all of them come with "code" (obviously on paperswithcode.com 😉) or even a Google Colab notebook where the approach can easily be run. As this task is about the top 3, I gather some additional info on them to facilitate a decision.

|Rank |Model|RMS |Versions  |Size [MB of saved weights] |License |
|-----|:-----|:---:|:-----:|:-----:|:-----:|
|1 |Depth Anything  |0.206   | small, base, large | 100 - 1,000+    | Apache-2.0 |
|3|MetaPrompt-SD|0.223   | large (requires stable diff.) |7,000+ |MIT |
|2 |EVP  |0.224   | large | 1,000+    | MIT |

## Depth Anything Model

To get a quick overview, go to the [impressive Depth Anything GitHub Pages site](https://depth-anything.github.io/). After that you can test your own images on the [Depth Anything HuggingFace Space](https://huggingface.co/spaces/LiheYoung/Depth-Anything) or explore it programmatically in [Google Colab](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/Depth%20Anything/Predicting_depth_in_an_image_with_Depth_Anything.ipynb).

This is not only the best model on the NYU-Depth V2 benchmark but also comes with great documentation, demos and within a few days of publication already a 🤗 HuggingFace integration! It's licensed under the Apache-2.0 license which is considered a business-friendly license that allows integration in own products without open-sourcing the product code yourself.

![image.png](attachment:5ef543d8-cc8b-4f2e-9f7f-16a1e5181a18.png)

### Run on your own machine

It is assumed that you have an Nvidia GPU, run Linux (on amd64) and have a recent version of Python 3 (e.g. 3.11) and the necessary Nvidia drivers installed (tested with CUDA version 12.1).

Create a local Python virtual environment:

```bash
mkdir depth-anything && cd depth-anything
python3 -m venv .venv
source .venv/bin/activate
pip install notebook ipywidgets jupyter_contrib_nbextensions
```

Now open the [Depth Anything Google Colab notebook](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/Depth%20Anything/Predicting_depth_in_an_image_with_Depth_Anything.ipynb) and download it under *File > Download > Download .ipynb*. Place the downloaded file in the *depth-anything* directory you created above. Start Jupyter notebook like this:

```console
jupyter notebook
```
    
Then open the *Predicting_depth_in_an_image_with_Depth_Anything.ipynb* notebook and run through the cells. You can switch between the small, medium and large versions of the model by modifying the *model* parameter in the first code cell under the **Pipeline API** section:

```python
pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-base-hf")
```

Available models:
* LiheYoung/depth-anything-small-hf
* LiheYoung/depth-anything-base-hf
* LiheYoung/depth-anything-large-hf



Here's a sample qualitative comparison of the results of the small, base and large models:

![image.png](attachment:346c8ab6-2966-4da4-89c0-cac3c580150a.png)