OV-DINO
=====

**OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion**

* Paper: https://arxiv.org/abs/2407.07844

![OV-DINO vs Previous Models](../assets/ovdino_vs_previous.jpg)

![OV-DINO's LASF](../assets/ovdino_lasf_overview.jpg)

![OV-DINO Overview](../assets/ovdino_overview.jpg)

## Installation

```bash
conda create -n ovdino python=3.10 -y
conda activate ovdino

# Install PyTorch for CUDA 11.6 from the official channels
conda install -y pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 \
                pytorch-cuda=11.6 -c pytorch -c nvidia

# Optional: install GCC 9 for compatibility when compiling detectron2
conda install -y gcc=9 gxx=9 -c conda-forge

# install cudatoolkit-11.6
conda install -y cudatoolkit-dev=11.6 -c conda-forge

# set CUDA_HOME env-var:
export CUDA_HOME=$CONDA_PREFIX  # since cudatoolkit-dev installs CUDA here
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib:$LD_LIBRARY_PATH

# Verify nvcc version
nvcc -V
# It should report "Cuda compilation tools, release 11.6".
```
 
```bash
git clone https://github.com/wanghao9610/OV-DINO.git OVDINO_repo
cd OVDINO_repo
export root_dir=$(realpath ./)
cd $root_dir/ovdino

# build and install detectron2
python -m pip install -e detectron2-717ab9

pip install addict

# install OV-DINO itself
pip install -e .
```


 * Later warning about numpy version
 ```
 pip install "numpy<2"
 ```

## Download model

```bash
wget https://huggingface.co/hao9610/OV-DINO/resolve/main/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth  -p OVDINO_repo/inits/
```

In [4]:
import numpy as np
np.__version__

'1.26.4'

In [1]:
import os
import cv2
import torch
import detectron2.data.transforms as T
from detectron2.checkpoint import DetectionCheckpointer
from detectron2.config import LazyConfig, instantiate
from detrex.data.datasets import clean_words_or_phrase

def filter_predictions_with_confidence(predictions, threshold=0.5):
    """
    Keep only instances above a given confidence threshold.
    """
    if "instances" in predictions:
        preds = predictions["instances"]
        keep_idxs = preds.scores > threshold
        predictions = predictions.copy()  # avoid modifying in place
        predictions["instances"] = preds[keep_idxs]
    return predictions

# ---------------------------------------------------------------------
# 1. Load config and build model
# ---------------------------------------------------------------------
# Ensure MODEL_ROOT points to the directory containing your checkpoints
os.environ["MODEL_ROOT"] = "./inits"
cfg = LazyConfig.load(
    "OVDINO_repo/ovdino/projects/ovdino/configs/ovdino_swin_tiny224_bert_base_eval_coco.py"
)
# Override with the full OV‑DINO checkpoint you downloaded
cfg.train.init_checkpoint = (
    "OVDINO_repo/inits/ovdino/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth"
)

model = instantiate(cfg.model)
model.to(cfg.train.device)
checkpointer = DetectionCheckpointer(model)
checkpointer.load(cfg.train.init_checkpoint)
model.eval();



  from .autonotebook import tqdm as notebook_tqdm
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


plant: score=0.567, box=[429.24652099609375, 76.37261962890625, 790.2409057617188, 796.6929321289062]
plant: score=0.532, box=[131.57545471191406, 480.75732421875, 464.2589416503906, 900.4660034179688]
vase: score=0.517, box=[462.1734924316406, 497.2107238769531, 786.7843017578125, 1088.8697509765625]
