Skip to content

Latest commit

 

History

History
204 lines (153 loc) · 10.2 KB

README.md

File metadata and controls

204 lines (153 loc) · 10.2 KB

SAI3D: Segment Any Instance in 3D with Open Vocabularies

Yingda Yin1,2*, Yuzheng Liu2,3* Yang Xiao4*, Daniel Cohen-Or5, Jingwei Huang6, Baoquan Chen2,3

1School of Computer Science, Peking University     2National Key Lab of General AI, China     3School of Intelligence Science and Technology, Peking University     4Ecole des Ponts ParisTech    5Tel-Aviv University     6Tencent    

CVPR 2024

Project Page | Arxiv Paper

Introduction

We introduce SAI3D, a novel zero-shot 3D instance segmentation approach that synergistically leverages geometric priors and semantic cues derived from Segment Anything Model (SAM).

Our approach combines geometric priors with the capabilities of 2D foundation models. We over-segment 3D point clouds into superpoints (top-left), and generate 2D image masks using SAM (bottom-left). We then construct a scene graph that quantifies the pairwise affinity scores of super points (middle). Finally, we leverage a progressive region growing to gradually merge 3D superpoints into the final 3D instance segmentation masks (right).

Usage

Installation

Prepare environment

conda create -n sai3d python=3.8
conda activate sai3d
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install open3d natsort matplotlib tqdm opencv-python scipy plyfile

Install Semantic-SAM

git clone https://github.com/UX-Decoder/Semantic-SAM.git Semantic-SAM --recursive
#if you encounter any problem about cuda version, try using cuda11.8 with the following command
#conda install nvidia/label/cuda-11.8.0::cuda  
python -m pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git'
pip install git+https://github.com/cocodataset/panopticapi.git
cd Semantic_SAM
python -m pip install -r requirements.txt
cd semantic_sam/body/encoder/ops
sh ./make.sh
cd - && mkdir checkpoints && cd checkpoints
wget https://github.com/UX-Decoder/Semantic-SAM/releases/download/checkpoint/swinl_only_sam_many2many.pth

Install OpenMask3D(if need semantic)

git clone https://github.com/OpenMask3D/openmask3d.git openmask3d --recursive
cd openmask3d
conda create --name=openmask3d python=3.8.5 # create new virtual environment
conda activate openmask3d # activate it
bash install_requirements.sh  # install requirements
pip install -e .  # install current repository in editable mode
mkdir checkpoints && cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth  #download SAM ckpt

Data Preparation

ScanNet

Download ScanNetV2 / ScanNet200 and organize the dataset as follows:

data
 ├── ScanNet
 │   ├── posed_images
 │   |   ├── scene0000_00
 │   |   │   ├──intrinsic_color.txt   
 │   |   │   ├──intrinsic_depth.txt   
 │   |   │   ├──0000.jpg     //rgb image
 │   |   │   ├──0000.png     //depth image
 │   |   │   ├──0000.txt     //extrinsic
 │   |   │   └── ...
 │   |   └── ...
 │   ├── scans
 │   |   ├── scene0000_00
 │   |   └── ...
 │   ├── Tasks
 │   |   ├── Benchmark
 │   |   │   ├──scannetv2_val.txt  
 │   |   │   ├──scannetv2_train.txt  
 │   |   │   └── ...

Get class-agnostic masks

  1. Obtain 2D SAM results

    Change the config here to false, and set the required parameter in this script then run:

    bash ./scripts/sam_scannet.sh

    The results will be stored at data/ScanNet/2D_masks, where the 2D segmentation results and visualization of 2D masks will be named as maskraw_<frame_number>.png and maskcolor_<frame_number>.png respectively.

  2. Obtain 3D superpoints For ScanNet dataset, superpoints are already provided in scans/<scene_id>/<scene_id>_vh_clean_2.0.010000.segs.json

    To generate superpoint on mesh of other dataset, we also use the mesh segmentator provided by ScanNet directly. Please check here to see the usage.

  3. 3D instance segmentation by region growing

    Set the required parameter in this script, then run SAI3D by using the following command:

    bash scripts/seg_scannet.sh

    The resulting class-agnostic masks will be exported into the format for ScanNet instance segmentation benchmark.

Evaluate class-agnostic results

Now you can implement class-agnostic evaluation directly on the results we got, which focuses only on the accuracy of the instance masks without considering any semantic label

We modify the original ScanNet instance segmentation benchmark to conduct it. We collect all 18 classes(excluding wall and floor) of gt masks in ScanNet-v2 dataset as our gt class-agnostic masks, and the AP score is reported over all of the foreground masks.

We provide processed gt class-agnostic masks here. Please download and extract it into your GT_DIR

  1. Prepare environment for ScanNet benchmark
    conda create -n eval python=2.7
    conda activate eval
    cd evaluation
    pip install -r requirements.txt
  2. Start evaluation
    python evauation/evaluate_class_agnostic_instance.py \
    --pred_path=PREDICTION_DIR \
    --gt_path=GT_DIR

The numerical results will be saved under the directory of your predictions by default.

Visualize class-agnostic results

Since the segmentation results in ScanNet evaluation format are tough to visualize, we provide functions in helpers/visualize.py to transform them into mesh(.ply) for visualization. Please check it to see the usage.

Assign semantic with OpenMask3D and conduct 3D instance segmentation evaluation

We prove that our proposed class-agnostic masks are more accurate and can be adopted in tasks like semantic instance segmentation. Here we choose OpenMask3D to assign semantic label for our class-agnostic masks.

  1. Reorganize scannet dataset

    Since OpenMask3D requires ScanNet dataset to be organized like this, we provide a script to reorganize the dataset with softlink.

       python helpers/format_convertion.py            \
       --app=0                                        \
       --base_dir=PATH_TO_PREVIOUS_SCANNET_DATASET    \
       --out_dir=PATH_TO_REORGANIZED_SCANNET_DATASET

    For example,

       python helpers/format_convertion.py            \
       --app=0                                        \
       --base_dir="data/ScanNet"                      \
       --out_dir="data/ScanNet_OpenMask3D"

    According to the convention of OpenMask3D, color and depth image of your data should share the same resolution. If not, please replace this line in OpenMask3D with the following codes to adjust the resolution of color image to the same as depth image's when loading them in OpenMask3D:

       img = Image.open(img_path).convert("RGB").resize(DEPTH_RESOLUTION,Image.BILINEAR)
       images.append(img)
  2. Prepare class-agnostic masks

    We've already got class-agnosic predictions from the previous section, and exported them into evaluation format for ScanNet benchmark.

    However, OpenMask3D requires class-agnostic masks to be saved in a .pt format before assigning semantic for them. So please run the following command to convert the previous format of class-agnostic predictions into the input format required by OpenMask3D.

       python helpers/format_convertion.py  \ 
       --app=1                              \   
       --base_dir=PATH_TO_PREDICTION_DIR    \
       --out_dir=PATH_TO_SAVE_PREDICTION_OF_NEW_FORMAT

    For example,

       RESULT_NAME="demo_scannet_5view_merge200_2-norm_semantic-sam_connect(0.9,0.5,5)_depth2"
       python helpers/format_convertion.py                  \     
       --app=1                                              \
       --base_dir="data/ScanNet/results/${RESULT_NAME}"     \
       --out_dir="data/class_agnostic_masks"
  3. Assign semantic and evaluate

    We provide processed gt masks for ScanNet200 semantic instance segmentation here.

    Now you can compute the per-mask scene features and run the evaluation of OpenMask3D on validation split of ScanNet200 dataset. Change the intrinsic_resolution parameter in OpenMask3D configuration with the resolution of your intrinsic_color.txt. Then set the required parameter in this script and run the following command:

    bash scripts/run_openmask3d_scannet200.sh

    This script first computes the mask features associated with each class-agnostic mask, and then query masks with 200 class names in ScanNet200 to assign semantic label for them. Afterwards, the evaluation script automatically runs in order to obtain 3D closed-vocabulary semantic instance segmentation scores.