Interested in Project 8 - Refining Zero-Shot Object Segmentation by Combining Vision Foundation Models #29497

DeepC004 · 2025-03-16T18:31:39Z

DeepC004
Mar 16, 2025

Dear Daan, Klaas, and Samet,

The idea of refining zero-shot object segmentation with models like DINOv2 and SAM sounds exciting and I’d love to get involved.

I have previous experience in relevant areas. For example, I co-authored a paper on few-shot learning, exploring techniques to compare features and enable generalization from limited examples—approaches that share similarities with zero-shot learning’s reliance on feature extraction for object identification. I’ve also worked with vision-based neural networks, such as a style transfer project, which has given me strong familiarity with PyTorch and solving vision challenges. Additionally, I’ve contributed to open-source projects, developing neural network implementations and integrating them into testing pipelines. These experiences have equipped me with a solid foundation in machine learning tools and producing reliable code. I’d be keen to use my experience to help build a segmentation system that’s solid and works broadly.

The zero-shot approach and the opportunity to enhance its generalizability are compelling, and I got a few questions.

Should we think about specific mess-ups—like cluttered backgrounds throwing off masks—to tackle first?
Are there certain datasets to consider to lean on for benchmarking?
How much room is there to tweak the refinement, say to mix in tricks like clustering or confidence thresholding or even CLIP if it fits?
Also, is there a plan for hooking this into something like OpenVINO’s model zoo later?

I’m looking forward to digging into DINOv2 and SAM and others architectures like Swin Transformer. If you’ve got any pointers on where to start or what’s ahead, I’d love to hear them. Hoping to team up on this!

Cheers,
Deep Chordia

Daankrol · 2025-03-25T13:52:42Z

Daankrol
Mar 25, 2025

The idea is to follow the interface/implementation of Visual Prompting in https://github.com/openvinotoolkit/model_api/tree/master
You are free to choose a method for mask refinement. It could be heuristic based, ML based or something new.
You could check out repos such as SAM, DINO, PerSAM or Matcher.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interested in Project 8 - Refining Zero-Shot Object Segmentation by Combining Vision Foundation Models #29497

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Interested in Project 8 - Refining Zero-Shot Object Segmentation by Combining Vision Foundation Models #29497

DeepC004 Mar 16, 2025

Replies: 1 comment

Daankrol Mar 25, 2025

DeepC004
Mar 16, 2025

Daankrol
Mar 25, 2025