Highlights
Lists (22)
Sort Name ascending (A-Z)
🤖 AI
💯 Algorithm
🔍 BigQuery
🔖
📎 CLIP / VLM
Data Mining
👁️🗨️ Vision
Game Bot
🧑💻 Git
🌐 GNN
👨 Personal Web Templates
💬 NLP
💻 nodesktop
JS, CSS🧊 object-centric learning
📖 Open Vocabulary
🎑 Scene Graph
📜 Templates
⚙️ Setup, dotfile
🎇 Part Segmentation
⭐ Hetero GNN / CL
🖥️ Ubuntu
🎲 Wordle
wordleStarred repositories
Python implementation of EVM(Eulerian Video Magnification)
[CVPR 2025] VGGT: Visual Geometry Grounded Transformer
The official NetsPresso Python package.
😎 Awesome lists of papers and codes about open-vocabulary perception, including both 3D and 2D
tabtoyou / VL-DINO
Forked from facebookresearch/dinoVerifying Vision-Language alignment using DINO visualization techniques on cross-attention maps
Embodied Reasoning Question Answer (ERQA) Benchmark
Official repository of ’Visual-RFT: Visual Reinforcement Fine-Tuning’
No fortress, purely open ground. OpenManus is Coming.
[ICCV 2023] Multi3DRefer: Grounding Text Description to Multiple 3D Objects
[NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
An easy way to apply LoRA to CLIP. Implementation of the paper "Low-Rank Few-Shot Adaptation of Vision-Language Models" (CLIP-LoRA) [CVPRW 2024].
Fine-tuning CLIP Text Encoders with Two-step Paraphrasing (EACL 2024, Findings)
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
[CVPR 2024] Official implementation of "Universal Segmentation at Arbitrary Granularity with Language Instruction"
The toolbox for the Google Refexp dataset proposed in this paper: http://arxiv.org/abs/1511.02283
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
[AAAI2025] - Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
A lightweight codebase for referring expression comprehension and segmentation
An official PyTorch implementation of the CRIS paper
Python3 Referring Expression Datasets API
Official implementation of "Can Language Understand Depth?"
This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video""
A single handwritten digit classifier, using the MNIST dataset. Pure Numpy.
[AAAI 2024] The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation"
[MM2024 Oral] 3D-GRES: Generalized 3D Referring Expression Segmentation
[ICCV 2023] Official code release of our paper "Referring Image Segmentation Using Text Supervision"
Official code release of "CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition"
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
[ICLR 2025] Duoduo CLIP: Efficient 3D Understanding with Multi-View Images