Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUN RGB-D is not in millimeters #12

Open
jbrownkramer opened this issue Mar 9, 2024 · 4 comments
Open

SUN RGB-D is not in millimeters #12

jbrownkramer opened this issue Mar 9, 2024 · 4 comments

Comments

@jbrownkramer
Copy link

I was trying to apply this model to my own data and not getting good results. I ran the NYUv2 dataset through my code, and the results seem to be in line with those reported in the ViT-Lens paper.

Digging into it, the issue is - at least partly - that the NYUv2 data is not in millimeters. Here is the matlab code for converting the png files to mm that is in the SUNRGBDtoolbox (https://rgbd.cs.princeton.edu/):

depthVis = imread(data.depthpath);
imsize = size(depthVis);
depthInpaint = bitor(bitshift(depthVis,-3), bitshift(depthVis,16-3));

In other words, the data in the png files is a circular shift left by 3 bits of the depth in mm (which for most data is just multiplying by 8).

I mention this because the code in #9 seems to indicate that it is assumed that the data is in mm. It might be important if other datasets get used that are in mm and not the SUN RGB-D format.

@StanLei52
Copy link
Collaborator

StanLei52 commented Mar 9, 2024

Thank you for pointing this out -- it is important to figure this out for a more general depth model. As such, could you please also check LanguageBind and their uploaded NYU-D -- I will look into their preprocessing pipeline instead of following ImageBind if it works on your own data.

@jbrownkramer
Copy link
Author

I will look into LanguageBind.

I will say this: I updated the processing on my pipeline to match the circular shift, quantization, and camera intrinsics as the NYU data. The results on our data are still not very good. My suspicion is that SUN RGB-D has no people in it, and the text labels I am trying to match are about the locations of people in the scene.

@jbrownkramer
Copy link
Author

Below is the transformation pipeline in LanguageBind. The starting format is depth in mm (NOT DISPARITY). I ran their inference example from the git homepage and max_depth is configured to 10. So in summary: read in the data in mm, convert to meters, clamp between .01 and 10 meters. Divide by 10 meters. Resize and center crop to 224, and normalize by OPENAI_DATASET_MEAN, OPENAI_DATASET_STD.

I tried running on the SUN RGB-D versions of the NYUv2 data directly and LanguageBind gave bad outputs. When I did a circular shift (to put it back into mm) it gave good results, so they are doing some preprocessing to convert the NYU data to mm first.

class DepthNorm(nn.Module):
    def __init__(
        self,
        max_depth=0,
        min_depth=0.01,
    ):
        super().__init__()
        self.max_depth = max_depth
        self.min_depth = min_depth
        self.scale = 1000.0  # nyuv2 abs.depth

    def forward(self, image):
        # image = np.array(image)
        depth_img = image / self.scale  # (H, W)   in meters
        depth_img = depth_img.clip(min=self.min_depth)
        if self.max_depth != 0:
            depth_img = depth_img.clip(max=self.max_depth)
            depth_img /= self.max_depth   #  0-1
        else:
            depth_img /= depth_img.max()
        depth_img = torch.from_numpy(depth_img).unsqueeze(0).repeat(3, 1, 1)  # assume image
        return depth_img.to(torch.get_default_dtype())

def get_depth_transform(config):
    config = config.vision_config
    transform = transforms.Compose(
        [
            DepthNorm(max_depth=config.max_depth),
            transforms.Resize(224, interpolation=transforms.InterpolationMode.BICUBIC),
            transforms.CenterCrop(224),
            transforms.Normalize(OPENAI_DATASET_MEAN, OPENAI_DATASET_STD),  # assume image
            # transforms.Normalize((0.5, ), (0.5, ))  # 0-1 to norm distribution
            # transforms.Normalize((0.0418, ), (0.0295, ))  # sun rgb-d  imagebind
            # transforms.Normalize((0.02, ), (0.00295, ))  # nyuv2
        ]
    )
    return transform

@StanLei52
Copy link
Collaborator

Got it, thanks @jbrownkramer! I will look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants