Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'm a little confused about the inconsistency in the number of datasets #66

Closed
BayMaxBHL opened this issue Nov 16, 2022 · 5 comments
Closed

Comments

@BayMaxBHL
Copy link

About the NYUv2:
Paper:“We train our network on a 50K RGB-Depth pairs subset following previous works.”
dataset_prepare.md:“Following previous work, I utilize about 50K image-depth pairs as our training set and standard 652 images as the validation set. ”
nyu_train.txt:Only 24,231 pairs of data.

@BayMaxBHL
Copy link
Author

Follow python utils/download_from_gdrive.py 1AysroWpfISmm-yRFGBgFTrLy6FjQwvwP sync.zip:
The sync.zip has only 72,792 files and 284 folders

@BayMaxBHL
Copy link
Author

I am not sure whether the amount of training data in the paper is the same as that in nyu_train.txt.

@BayMaxBHL
Copy link
Author

It's interesting that everyone's papers say 50k training data. Maybe everyone uses sync.zip

@zhyever
Copy link
Owner

zhyever commented Nov 16, 2022

Thanks for finding the typo in our paper. It is true that everyone uses sync.zip. :D
As you can see in the log file, we use 24231 pairs for training.

@BayMaxBHL
Copy link
Author

From 《From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation》:
“The NYU Depth V2 dataset [42] contains 120K RGB and depth pairs having a size of 480 × 640 acquired as video sequences using a Microsoft Kinect from 464 indoor scenes. We follow the official train/test split as previous works, using 249 scenes for training and 215 scenes (654 images) for testing. From the total 120K image-depth pairs, due to asynchronous capturing rates between RGB images and depth maps, we associate and sample them using timestamps by even-spacing in time, resulting in 24231 imagedepth pairs for the training set. Using raw depth images and camera projections provided by the dataset, we align the image-depth pairs for accurate pixel registrations. We use κ = 10 for this dataset.”

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants