GroundingDINO
====

**Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection**
 * Paper: https://arxiv.org/abs/2303.05499

![GroundingDINO](../assets/groundingdino_overview.png)

```bash
pip install torch torchvision
pip install transformers
pip install matplotlib
pip install supervision
```

In [1]:
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForZeroShotObjectDetection 

model_id = "IDEA-Research/grounding-dino-base"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForZeroShotObjectDetection.from_pretrained(model_id)

model.eval().to(device);

In [None]:
image_path = "../samples/plants.jpg"
image = Image.open(image_path).convert("RGB")

text = "a plant. a vase."

inputs = processor(
    images=image, text=text, return_tensors="pt").to(device)
with torch.no_grad():
    outputs = model(**inputs)

results = processor.post_process_grounded_object_detection(
    outputs,
    inputs.input_ids,
    box_threshold=0.4,
    text_threshold=0.3,
    target_sizes=[image.size[::-1]]
)
