SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

Cheng-Yen Yang, Hsiang-Wei Huang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang

University of Washington

This repository is a fork of the official implementation of SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

Getting Started

Code tested on Cuda 12.1 with Python 3.10.11 on Windows

> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Feb__8_05:53:42_Coordinated_Universal_Time_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

SAMURAI Installation

SAM 2 needs to be installed first before use. The code requires python>=3.10, as well as torch>=2.3.1 and torchvision>=0.18.1. Please follow the instructions here to install both PyTorch and TorchVision dependencies. You can install the SAMURAI version of SAM 2 on a GPU machine using:

> cd sam2
> pip install -e .
> pip install -e ".[notebooks]"

Please see INSTALL.md from the original SAM 2 repository for FAQs on potential issues and solutions.

Install dependencies and ultralytics (for prompt)

> pip install matplotlib tikzplotlib jpeg4py opencv-python lmdb pandas scipy loguru
> pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121 --upgrade
> pip install ultralytics

SAM 2.1 Checkpoint Download

> cd checkpoints
> ./download_ckpts.sh
> cd ..

Run the same with git bash if it opens .sh file in vscode for you

Main Inference

> python sam2/infer.py --video "path-to-video"

For example,

> python sam2/infer.py --video "C://Users//johndoe//samurai//myvideo.mp4"

Running the Video with Frame Selection

Run the infer.py file using Python, passing the video file as an argument: python infer.py --video <video_file>
The program will start playing the video to select a frame containing a person by pressing the SPACE key.
Once a frame is selected, the program will wait for user input in the terminal window.

Entering Person ID

Enter the ID of one of the detected people in the terminal window when prompted.
The entered ID should match one of the IDs assigned to the detected persons by the YOLO11x model.

Loading and Displaying Model Outputs

After entering the correct ID, the program will load the YOLO11x model and SAM2 side-by-side and show the results

Acknowledgment

SAMURAI is built on top of SAM 2 by Meta FAIR.

The VOT evaluation code is modifed from VOT Toolkit by Luka Čehovin Zajc.

Citation

Please consider citing our paper and the wonderful SAM 2 if you found our work interesting and useful.

@article{ravi2024sam2,
  title={SAM 2: Segment Anything in Images and Videos},
  author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
  journal={arXiv preprint arXiv:2408.00714},
  url={https://arxiv.org/abs/2408.00714},
  year={2024}
}

@misc{yang2024samurai,
      title={SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory}, 
      author={Cheng-Yen Yang and Hsiang-Wei Huang and Wenhao Chai and Zhongyu Jiang and Jenq-Neng Hwang},
      year={2024},
      eprint={2411.11922},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.11922}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
data		data
lib		lib
sam2		sam2
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
image-1.png		image-1.png
image.png		image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

Getting Started

SAMURAI Installation

SAM 2.1 Checkpoint Download

Main Inference

Running the Video with Frame Selection

Entering Person ID

Loading and Displaying Model Outputs

Acknowledgment

Citation

About

Uh oh!

Languages

License

vibhavnirmal/samurai

Folders and files

Latest commit

History

Repository files navigation

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

Getting Started

SAMURAI Installation

SAM 2.1 Checkpoint Download

Main Inference

Running the Video with Frame Selection

Entering Person ID

Loading and Displaying Model Outputs

Acknowledgment

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages